Create and maintain optimal data pipeline architecture
Assemble large complex data sets that meet functional / nonfunctional business requirements.
Identify design and implement internal process improvements: automating manual processes optimizing data delivery redesigning infrastructure for greater scalability etc.
Build the infrastructure required for optimal extraction transformation and loading of data from a wide variety of data sources using SQL and scripting tools
Ensure that data transformations are well documented follow best practices and understood by business users and developers
Understand the relationships across business information and units of data Reverse engineer data from various source systems.
Technical Skills:
Working with cloud services data lake storage Hadoop Spark Python/ Scala Hive HDFS SQL and NoSQL databases.
Knowledge on creating Batch or Realtime Data pipelines with onpremise or different cloud services ETL tools and Kafka etc.
Performance optimization of complex ETL mappings for relational and nonrelational workloads.
Hands on Unix scripting or PowerShell.
Develop data warehouse model ensuring data design follows the prescribed reference architecture framework while reflecting appropriate business rules built for logical physical and conceptual model.
Knowledge on setting up code versioning with GitHub and CICD pipelines.
Experience with data pipeline and workflow management tools: Azkaban Luigi Airflow
Experience with big data tools: Hadoop Spark or Kafka
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.