Job Title: Cloud Engineer Spark/Databricks Specialist
Location: Remote
Job Type: Contract
Industry: IT/Cloud Engineering
Job Summary:
We are looking for a highly skilled Cloud Engineer with a specialization in Apache Spark and Databricks to join our dynamic team. The ideal candidate will have extensive experience working with cloud platforms such as AWS Azure and GCP and a deep understanding of data engineering ETL processes and cloudnative tools. Your primary responsibility will be to design develop and maintain scalable data pipelines using Spark and Databricks while optimizing performance and ensuring data integrity across diverse environments.
Key Responsibilities:
Design and Development:
- Architect develop and maintain scalable ETL pipelines using Databricks Apache Spark (Scala Python) and other cloudnative tools such as AWS Glue Azure Data Factory and GCP Dataflow.
- Design and build data lakes and data warehouses on cloud platforms (AWS Azure GCP).
- Implement efficient data ingestion transformation and processing workflows with Spark and Databricks.
- Optimize the performance of ETL processes for faster data processing and lower costs.
- Develop and manage data pipelines using other ETL tools such as Informatica SAP Data Intelligence and others as needed.
Data Integration and Management:
- Integrate structured and unstructured data sources (relational databases APIs ERP systems) into the cloud data infrastructure.
- Ensure data quality validation and integrity through rigorous testing.
- Perform data extraction and integration from SAP or ERP systems ensuring seamless data flow.
Performance Optimization:
- Monitor troubleshoot and enhance the performance of Spark/Databricks pipelines.
- Implement best practices for data governance security and compliance across data workflows.
Collaboration and Communication:
- Collaborate with crossfunctional teams including data scientists analysts and business stakeholders to define data requirements and deliver scalable solutions.
- Provide technical guidance and recommendations on cloud data engineering processes and tools.
Documentation and Maintenance:
- Document data engineering solutions ETL pipelines and workflows.
- Maintain and support existing data pipelines ensuring they operate effectively and align with business goals.
Qualifications:
Education:
- Bachelors degree in Computer Science Information Technology or a related field. Advanced degrees are a plus.
Experience:
- 7 years of experience in cloud data engineering or similar roles.
- Expertise in Apache Spark and Databricks for data processing.
- Proven experience with cloud platforms like AWS Azure and GCP.
- Experience with cloudnative ETL tools such as AWS Glue Azure Data Factory Kafka GCP Dataflow etc.
- Handson experience with data platforms like Redshift Snowflake Azure Synapse and BigQuery.
- Experience in extracting data from SAP or ERP systems is preferred.
- Strong programming skills in Python Scala or Java.
- Proficient in SQL and query optimization techniques.
Skills:
- Indepth knowledge of Spark/Scala for highperformance data processing.
- Strong understanding of data modeling ETL/ELT processes and data warehousing concepts.
- Familiarity with data governance security and compliance best practices.
- Excellent problemsolving communication and collaboration skills.
Preferred Qualifications:
- Certifications in cloud platforms (e.g. AWS Certified Data Analytics Google Professional Data Engineer Azure Data Engineer Associate).
- Experience with CI/CD pipelines and DevOps practices for data engineering.
- Exposure to Apache Hadoop Kafka or other data frameworks is a plus.
gcp,data integration,data modeling,compliance,collaboration,sql,security,data warehousing,data,cloud,performance optimization,devops,databricks,spark,kafka,apache hadoop,ci/cd pipelines,data lakes,data engineering,apache spark,data management,azure,communication,gcp dataflow,data quality,data warehouses,documentation,scala,azure data factory,python,query optimization,problem-solving,data governance,aws,etl,etl/elt processes,aws glue,informatica,sap data intelligence,etl processes