Overview:
The Data Warehouse Engineer plays a crucial role in designing developing and maintaining the organizations data warehouse infrastructure. This role is vital in ensuring that the data warehouse meets the data storage and data analysis needs of the organization providing reliable and efficient access to the organizations data assets.
Key Responsibilities:
- According to the companys data warehouse specifications and business understanding build a universal and flexible data warehouse system that can quickly support the needs and reduce repetitive development work efforts.
- Data model design development testing deployment online data job monitoring and the ability to quickly solve complex problems especially the optimization of complex calculation logic and performance tuning etc.
- Participate in Data governance including the construction of the company s metadata management system and data quality monitoring system.
- Design and implement a data platform integrated with data lake warehouse to support realtime data processing and analysis requirements.
- Build knowledge graph and provide indepth business insight.
- Participate in technical team building and learning growth and contribute to the team s overall knowledge accumulation and skill improvement.
Required Qualifications:
- 5 years experiences of data lake and data warehouse design and development experience.
- Deeply understanding of data warehouse modeling and data governance. Solid knowledge of data warehouse development methodology including dimensional modeling information factory etc.
- Proficient in Java / Scala / Python (at least one language) and Hive & Spark SQL programming languages.
- Familiar with OLAP technology (such as: kylin impala presto druid etc.).
- Proficient in Big Data batch pipeline development.
- Familiar with Big Data components including but not limited to Hadoop Hive Spark Delta lake Hudi Presto Hbase Kafka Zookeeper Airflow Elastic search Redis etc.
- Experiences with AWS Big Data services are a plus.
- Have a strong team collaboration attitude and develop partnerships with other teams and businesses.
- Rich experience in realtime data processing familiar with stream processing frameworks such as Apache Kafka Apache Flink indepth knowledge of Lakehouse technology practical project experience proficiency in StarRocks including its data model design query optimization and performance tuning.
- Experience in knowledge graph construction and application and knowledge of graph databases such as Nebula etc.
data warehouse design,knowledge graph construction,spark,aws big data services,real-time data processing,hudi,delta lake,kafka,hive,lakehouse technology,stream processing frameworks,scripting,data governance,presto,data warehousing,data integration,graph databases,java,hbase,data quality,scala,big data batch pipeline development,zookeeper,team collaboration,airflow,technical documentation,redis,python,olap technology,database design,data modeling,spark sql,hadoop,starrocks,elastic search