Essential Duties and Responsibilities:
- Create and maintain optimal data pipeline architecture
- Assemble large complex data sets that meet functional / nonfunctional business requirements.
- Identify design and implement internal process improvements: automating manual processes optimizing data delivery redesigning infrastructure for greater scalability etc.
- Build the infrastructure required for optimal extraction transformation and loading of data from a wide variety of data sources using SQL and scripting tools
- Ensure that data transformations are well documented follow best practices and understood by business users and developers
- Understand the relationships across business information and units of data Reverse engineer data from various source systems.
Technical Skills:
- Working with cloud services data lake storage Hadoop Spark Python/ Scala Hive HDFS SQL and NoSQL databases.
- Knowledge on creating Batch or Realtime Data pipelines with onpremise or different cloud services ETL tools and Kafka etc.
- Performance optimization of complex ETL mappings for relational and nonrelational workloads.
- Hands on Unix scripting or PowerShell.
- Develop data warehouse model ensuring data design follows the prescribed reference architecture framework while reflecting appropriate business rules built for logical physical and conceptual model.
- Knowledge on setting up code versioning with GitHub and CICD pipelines.
- Experience with data pipeline and workflow management tools: Azkaban Luigi Airflow
- Experience with big data tools: Hadoop Spark or Kafka