Mandatory Skills:
Proficient in Python (Including popular python packages e.g. Pandas NumPy etc.) and SQL
Strong background in distributed data processing and storage (e.g. Apache Spark Apache Hadoop)
Large scale (TBs of data) data engineering skills Model data create production ready ETL pipelines
Development experience with at least one cloud (Azure high preference AWS GCP)
Knowledge of data lake and data lake house patterns
Knowledge of ETL performance tuning and cost optimization
Knowledge of data structures and algorithms and good software engineering practices
Good to have Skills
Experience with Azure Databricks
Knowledge of DevOps ETL pipeline orchestration
Tools: GitHub actions Terraform Databricks workflows Azure DevOps
Certifications: Databricks Azure AWS GCP highly preferred
Knowledge of code version control (e.g. Git)
Education and experience:
Bachelor s or master s degree (or equivalent) in Computer Science Engineering or a related field
3 years of handson experience in data engineering
Thorough understanding of big data principles techniques and best practices.
Mandatory Skills Proficient in Python (Including popular python packages e.g. Pandas, NumPy etc.) and SQL Strong background in distributed data processing and storage (e.g. Apache Spark, Apache Hadoop) Large scale (TBs of data) data engineering skills - Model data, create production ready ETL pipelines Development experience with at least one cloud (Azure high preference, AWS, GCP) Knowledge of data lake and data lake house patterns Knowledge of ETL performance tuning and cost optimization Knowledge of data structures and algorithms and good software engineering practices