Job Description:
Proven experience in assembling large complex sets of data that meets nonfunctional and functional business requirements
Good exposure in working with Azure data bricks Pyspark Spark SQL Scala (Lazy evaluation and delta tables Parquet formats Working with large and complex datasets)
Experience/Knowledge in ETL data pipelines data flow techniques using Azure Data Services
Leverage Databricks features such as Delta Lake Unity Catalog and advanced Spark configurations for efficient data management.
Debug Spark jobs analyze performance issues and implement optimizations to Ensure pipelines meet SLAs for latency and throughput.
Implement data partitioning caching and clustering strategies for performance tuning.
Good understanding about SQL Databases NOSQL DBs Data Warehouse Hadoop and various data storage options on the cloud.
Develop and manage CI/CD pipelines for deploying Databricks notebooks and jobs using tools like Azure DevOps Git or Jenkins for version control and automation.
Experience in development projects as Data Architect
Must Need Skills Data factory Databricks Databricks Architecture Synapse Py Spark / Python and SparkAzure DB: Azure SQL / Cosmos DB
Integrate data validation frameworks like Great Expectations & Implement data quality checks to ensure reliability and consistency.
Build and maintain a Lakehouse architecture in ADLS / Databricks.
Manage access controls and ensure compliance with data governance policies using Unity Catalog and RoleBased Access Control (RBAC).
Experience integrating different data sources.
Good Experiences in Snowflake added advantage.
Experience in supporting BI and Data Science teams in consuming the data in a secure and governed manner
Create and maintain comprehensive documentation for data processes procedures and architecture designs