The consultant will be responsible for the following tasks :
Collaborate with crossfunctional teams (including Data Scientists Analysts Developers) to define data requirements pipeline design and solutions.
Design implement and maintain scalable ETL pipelines using AWS Glue (Spark) Python and PySpark.
Manage complex data workflows and dependencies including with AWS Lambda using Airflow (or similar orchestration tools).
Build and maintain cloudnative scalable and costeffective data infrastructure on AWS ensuring performance optimization.
Integrate and optimize data flows across various AWS services like S3 (Glue Catalog Athena) Aurora Postgres Redshift and Iceberg tables.
Ensure data quality governance and compliance standards are met across all data processes.
Take ownership of the endtoend lifecycle of data pipelines from design to deployment and monitoring.
Collaborate closely with Data Science teams to operationalize models leveraging AWS Sagemaker where applicable.
Ensure strong documentation practices and follow best practices for scalability maintainability and cost management.
Qualifications :
- Masters degree in Data
- 5 years of experience in a similar role
Mandatory Hard skills :
Python & Spark
- Proficiency in Python (including Pandas) for data transformations using TDD approach.
- Handson experience with Apache Spark ideally via AWS Glue.
Cloud Services (AWS)
- Experience with AWS services such as S3 Glue Athena Redshift Aurora Lambda IAM and EventBridge.
- Comfortable with cloudbased architecture serverless design and deployment strategies.
Data Workflow Orchestration
- Experience in building and managing DAGs in Airflow (or similar tools).
- Familiarity with Lambdabased eventdriven architectures.
Software Engineering Practices
- Proficiency with Git (branching pull requests code reviews) and CI/CD pipelines.
- Understanding of release management and automated testing in a multienvironment setup.
Big Data Ecosystems
- Handson experience with distributed processing frameworks (like Spark) and largescale data lake solutions including Iceberg tables.
- Familiarity with data lake architectures partitioning strategies and best practices.
Hard skills that would be a real plus :
- Experience with dimensional modelling partitioning strategies or other best practices.
- SQL knowledge for properly designing and maintaining data schemas for efficient querying and reporting
- Infrastructure as a code: Familiarity with tools like Terraform or CloudFormation for automated AWS provisioning.
- EventDriven Architectures: Experience with eventdriven architectures and working with AWS EventBridge / Lambda / SQS / SNS
Soft skills :
- Effective communication skills for scrum team collaboration
- Ownership & Accountability: Drive projects independently and take full responsibility for deliverables.
- Effective Communication: Ability to explain technical details to both technical and nontechnical stakeholders.
- Strong ProblemSolving: Ability to dissect complex issues propose scalable solutions and optimize workflows.
- Team Collaboration: Comfortable working with crossfunctional teams (Data Scientists Developers Analysts etc.).
- Adaptability & Continuous Learning: Eagerness to explore emerging technologies and refine existing processe
Remote Work :
Yes
Employment Type :
Fulltime