drjobs ETL Software Developer Databricks 8100-1314

ETL Software Developer Databricks 8100-1314

Employer Active

drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Toronto - Canada

Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Job Description

HM Note: This hybrid role is three (3) days in office

This role is responsible for designing developing maintaining and optimizing ETL (Extract Transform Load) processes in Databricks for data warehousing data lakes and analytics. and nbsp;The developer will work closely with data architects and business teams to ensure the efficient transformation and movement of data to meet business needs including handling Change Data Capture (CDC) and streaming data.
and nbsp;
Tools used are:
  • Azure Databricks Delta Lake Delta Live Tables and Spark to process structured and unstructured data.
  • Azure Databricks/PySpark (good Python/PySpark knowledge required) to build transformations of raw data into curated zone in the data lake.
  • Azure Databricks/PySpark/SQL (good SQL knowledge required) to develop and/or troubleshoot transformations of curated data into FHIR.

Data design
o and nbsp; and nbsp; and nbsp;Understand the requirements. Recommend changes to models to support ETL design.
o and nbsp; and nbsp; and nbsp;Define primary keys indexing strategies and relationships that enhance data integrity and performance across layers.
o and nbsp; and nbsp; and nbsp;Define the initial schemas for each data layer
o and nbsp; and nbsp; and nbsp;Assist with data modelling and updates of sourcetotarget mapping documentation
o and nbsp; and nbsp; and nbsp;Document and implement schema validation rules to ensure incoming data conforms to expected formats and standards
o and nbsp; and nbsp; and nbsp;Design data quality checks within the pipeline to catch inconsistencies missing values or errors early in the process.
o and nbsp; and nbsp; and nbsp;Proactively communicate with business and IT experts on any changes required to conceptual logical and physical models communicate and review timelines dependencies and risks.

Development of ETL and nbsp;strategy and and nbsp;solution for different sets of data modules
o and nbsp; and nbsp; and nbsp;Understand the Tables and Relationships in the data model.
o and nbsp; and nbsp; and nbsp;Create low level design documents and test cases for ETL development.
o and nbsp; and nbsp; and nbsp;Implement errorcatching logging retry mechanisms and handling data anomalies.
o and nbsp; and nbsp; and nbsp;Create the workflows and pipeline design

Development and testing of data pipelines with Incremental and Full Load.
o and nbsp; and nbsp; and nbsp;Develop high quality ETL mappings/scripts/notebooks
o and nbsp; and nbsp; and nbsp;Develop and maintain pipeline from Oracle data source to Azure Delta Lakes and FHIR
o and nbsp; and nbsp; and nbsp;Perform unit testing
o and nbsp; and nbsp; and nbsp;Ensure performance monitoring and improvement

Performance review data consistency checks
o and nbsp; and nbsp; and nbsp;Troubleshoot performance issues ETL issues log activity for each pipeline and transformation.
o and nbsp; and nbsp; and nbsp;Review and optimize overall ETL performance.

Endtoend integrated testing for Full Load and Incremental Load

Plan for Go Live Production Deployment.
o and nbsp; and nbsp; and nbsp;Create production deployment steps.
o and nbsp; and nbsp; and nbsp;Configure parameters scripts for go live. Test and review the instructions.
o and nbsp; and nbsp; and nbsp;Create release documents and help build and deploy code across servers.

Go Live Support and Review after Go Live.
o and nbsp; and nbsp; and nbsp;Review existing ETL process tools and provide recommendation on improving performance and reduce ETL timelines.
o and nbsp; and nbsp; and nbsp;Review infrastructure and remediate issues for overall process improvement

Knowledge Transfer to Ministry staff development of documentation on the work completed.
o and nbsp; and nbsp; and nbsp;Document work and share the ETL endtoend design troubleshooting steps configuration and scripts review.
  • o and nbsp; and nbsp; and nbsp;Transfer documents scripts and review of documents to Ministry.

Skills
Experience and Skill Set Requirements

Must Have Skills
  • 7 years using ETL tools such as Microsoft SSIS stored procedures TSQL and nbsp;
  • 2 Delta Lake Databricks and Azure Databricks pipelines
  • Strong knowledge of Delta Lake for data management and optimization.
  • Familiarity with Databricks Workflows for scheduling and orchestrating tasks.
  • 2 years Python and PySpark and nbsp;
  • Solid understanding of the Medallion Architecture (Bronze Silver Gold) and experience implementing it in production environments. and nbsp;
  • Handson experience with CDC tools (e.g. GoldenGate) for managing realtime data.
  • SQL Server Oracle
Experience:
  • Experience of 7 years of working with and nbsp;SQL Server TSQL Oracle PL/SQL development or similar relational databases
  • Experience of 2 years of working with and nbsp;Azure Data Factory Databricks and Python development
  • Experience building data ingestion and change data capture using Oracle Golden Gate and nbsp;
  • Experience in designing developing and implementing ETL pipelines using Databricks and related tools to ingest transform and store largescale datasets
  • Experience in leveraging Databricks Delta Lake Delta Live Tables and Spark to process structured and unstructured data.
  • Experience working with building databases data warehouses and working with delta and full loads
  • Experience on Data modeling and tools e.g. SAP Power Designer Visio or similar
  • Experience working with SQL Server SSIS or other ETL tools solid knowledge and experience with SQL scripting
  • Experience developing in an Agile environment
  • Understanding data warehouse architecture with a delta lake
  • Ability to analyze design develop test and document ETL pipelines from detailed and highlevel specifications and assist in troubleshooting.
  • Ability to utilize SQL to perform DDL tasks and complex queries
  • Good knowledge of database performance optimization techniques
  • Ability to assist in the requirements analysis and subsequent developments
  • Ability to conduct unit testing and assist in test preparations to ensure data integrity
  • Work closely with Designers Business Analysts and other Developers
  • Liaise with Project Managers Quality Assurance Analysts and Business Intelligence Consultants
  • Design and implement technical enhancements of Data Warehouse as required.


Development Database and ETL experience (60 points)
  • Experience in developing and managing ETL pipelines jobs and workflows in Databricks.
  • Deep understanding of Delta Lake for building data lakes and managing ACID transactions schema evolution and data versioning.
  • Experience automating ETL pipelines using Delta Live Tables including handling Change Data Capture (CDC) for incremental data loads.
  • Proficient in structuring data pipelines with the Medallion Architecture to scale data pipelines and ensure data quality.
  • Handson experience developing streaming tables in Databricks using Structured Streaming and readStream to handle realtime data.
  • Expertise in integrating CDC tools like GoldenGate or Debezium for processing incremental updates and managing realtime data ingestion.
  • Experience using Unity Catalog to manage data governance access control and ensure compliance.
  • Skilled in managing clusters jobs autoscaling monitoring and performance optimization in Databricks environments.
  • Knowledge of using Databricks Autoloader for efficient batch and realtime data ingestion.
  • Experience with data governance best practices including implementing security policies access control and auditing with Unity Catalog.
  • Proficient in creating and managing Databricks Workflows to orchestrate job dependencies and schedule tasks.
  • Strong knowledge of Python PySpark and SQL for data manipulation and transformation.
  • Experience integrating Databricks with cloud storage solutions such as Azure Blob Storage AWS S3 or Google Cloud Storage.
  • Familiarity with external orchestration tools like Azure Data Factory
  • Implementing logical and physical data models
  • Knowledge of FHIR is an asset

Design Documentation and Analysis Skills (20 points)
  • Demonstrated experience in creating design documentation such as:
  • Schema definitions
  • Error handling and logging
  • ETL Process Documentation
  • Job Scheduling and Dependency Management
  • Data Quality and Validation Checks
  • Performance Optimization and Scalability Plans
  • Troubleshooting Guides
  • Data Lineage
  • Security and Access Control Policies applied within ETL
  • Experience in FitGap analysis system use case reviews requirements reviews coding exercises and reviews.
  • Participate in defect fixing testing support and development activities for ETL
  • Analyze and nbsp;and document solution complexity and interdependencies including providing support for data validation.
  • Strong analytical skills for troubleshooting problemsolving and ensuring data quality.

Certifications (10 points)
Certified in one or more of the following certifications:
  • Databricks Certified Data Engineer Associate
  • Databricks Certified Professional Data Engineer
  • Microsoft Certified: Azure Data Engineer Associate
  • AWS Certified Data Analytics Specialty
  • Google Cloud Professional Data Engineer

Communication Leadership Skills and Knowledge Transfer (10 points)
and nbsp;
  • Ability to collaborate effectively with crossfunctional teams and communicate complex technical concepts to nontechnical stakeholders.
  • Strong problemsolving skills and experience working in an Agile or Scrum environment.
  • Ability to provide technical guidance and support to other team members on Databricks best practices.
  • Must have previous work experience in conducting Knowledge Transfer sessions ensuring the resources will receive the required knowledge to support the system.
  • Must develop documentation and materials as part of a review and knowledge transfer to other members.

Must Have Skills
  • 7 years using ETL tools such as Microsoft SSIS stored procedures TSQL and nbsp;
  • 2 Delta Lake Databricks and Azure Databricks pipelines
  • Strong knowledge of Delta Lake for data management and optimization.
  • Familiarity with Databricks Workflows for scheduling and orchestrating tasks.
  • 2 years Python and PySpark and nbsp;
  • Solid understanding of the Medallion Architecture (Bronze Silver Gold) and experience implementing it in production environments. and nbsp;
  • Handson experience with CDC tools (e.g. GoldenGate) for managing realtime data.
  • SQL Server Oracle

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.