Job Title: Python/Pyspark Big Data ETL Engineer
Location: Princeton NJ (Hybrid)
Duration: 6 months
Job Description
In order to be successful:
- You should have a working knowledge of industry standard Data Infrastructure (e.g. Warehouse BI Analytics BigData etc.) tools with the goal of providing end users with analytics at the speed of thought.
- You should be proficient at developing architecting standardizing and supporting technology platforms using Industry leading ETL solutions.
- You should thrive in building scalable and high throughput systems
- You should have experience with agile BI & ETL practices to assist with the interim Data preparation for Data Discovery & selfservice needs.
- You must have strong communication presentation problemsolving and troubleshooting skills.
- You should be highly motivated to drive innovations companywide.
You ll need to Have:
- 5 years of experience in designing and developing ETL pipelines leveraging pyspark/python.
- Strong understanding of data warehousing methodologies ETL processing and dimensional data modeling.
- Advanced SQL capabilities are required. Knowledge of database design techniques and experience working with extremely large data volumes is a plus.
- Demonstrated experience and ability to work with business users to gather requirements and manage scope.
- Experience in workflow tools such as Airflow or Tidal
- Experience working in a big data environment with technologies such as Greenplum Hadoop and HIVE
- BA BS MS PhD in Computer Science Engineering or related technology field
We d love to see:
- Experience with large database and DW Implementation (20 TBs)
- Understanding of VLDB performance aspects such as table partitioning sharding table distribution and optimization techniques
- Knowledge of reporting tools such as Qlik Sense Tableau Cognos