AWS Data Engineer with EMR clusters
Piscataway NJ (locals needed)
12 months
- Data Pipeline Development: Design and implement robust ETL processes to extract transform and load data from various sources into data lakes and warehouses.
- AWS EMR Clusters: Configure manage and optimize Amazon EMR clusters for big data processing using Apache Spark Hive or Presto.
- Container Orchestration: Utilize Kubernetes for deploying scaling and managing containerized applications and services.
- CI/CD Implementation: Develop and maintain CI/CD pipelines for automated deployment of data applications and services using tools like Jenkins GitLab CI or AWS CodePipeline.
- SQL Development: Write complex SQL queries for data manipulation and retrieval ensuring high performance and scalability.
- Data Quality and Governance: Implement data quality checks monitoring and logging mechanisms to ensure data reliability and compliance.
- Collaboration: Work closely with data scientists analysts and other stakeholders to understand data requirements and deliver solutions that meet business needs.
- Documentation: Maintain comprehensive documentation of data architecture processes and workflows.
Qualifications:
- Education: Bachelors degree in Computer Science Data Science Information Technology or a related field.
- Experience: 3 years of experience in data engineering or a related role with a focus on AWS technologies.
- AWS Expertise: Proficient in AWS services such as EMR S3 RDS Redshift Lambda and CloudFormation.
- Kubernetes Knowledge: Experience with Kubernetes for container orchestration and microservices architecture.
- CI/CD Tools: Familiarity with CI/CD tools and practices including version control using Git.
- SQL Proficiency: Strong knowledge of SQL with experience in relational databases (e.g. MySQL PostgreSQL) and data warehousing solutions.
- Programming Skills: Proficiency in programming languages such as Python Java or Scala for data processing tasks.
- ProblemSolving Skills: Ability to troubleshoot complex data issues and optimize performance.
- Communication: Excellent communication skills with the ability to work collaboratively in a team environment.