We are looking for strong engineers whom we can teach business context within a short amount of time.You will join a worldclass data engineering team with a design
first attitude focus on building quality scalable and testable code with rigor on data engineering practices.
- Design build and maintain robust and efficient data pipelines to support various products and business needs.
- Design data models for efficient storage and retrieval to meet critical product and business requirements.
- Build scalable data pipelines (SparkSQL & Scala) leveraging Airflow scheduler/executor framework
- Implement SQL based data quality checks and investigate data issues
- Collaborate with crossfunctional teams including product managers engineers and business partners to align on data requirements and develop scalable systems.
Requirements
- 59 years of relevant experience.
- Extensive experience designing building and operating robust distributed data platforms (e.g. Spark Kafka Flink HBase) and handling data at the petabyte scale.
- Strong knowledge of Scala Python and expertise with data processing technologies and query authoring (SQL).
- Demonstrated ability to analyze large data sets to identify gaps and inconsistenciesprovide data insights and advance effective product solutions
- Expertise with ETL schedulers such as Apache Airflow Luigi Oozie AWS Glue or similar frameworks
- Solid understanding of data warehousing concepts and handson experience with relational databases (e.g. PostgreSQL MySQL) and columnar databases (e.g.Redshift BigQuery HBase ClickHouse)
- Excellent written and verbal communication skills
Your Expertise: 5-9+ years of relevant experience. Extensive experience designing, building, and operating robust distributed data platforms (e.g., Spark, Kafka, Flink, HBase) and handling data at the petabyte scale. Strong knowledge of Scala, Python and expertise with data processing technologies and query authoring (SQL). Demonstrated ability to analyze large data sets to identify gaps and inconsistencies,provide data insights, and advance effective product solutions Expertise with ETL schedulers such as Apache Airflow, Luigi, Oozie, AWS Glue or similar frameworks Solid understanding of data warehousing concepts and hands-on experience with relational databases (e.g., PostgreSQL, MySQL) and columnar databases (e.g.,Redshift, BigQuery, HBase, ClickHouse) Excellent written and verbal communication skills