drjobs Data Engineer Python PySpark Apache Airflow NoSQL

Data Engineer Python PySpark Apache Airflow NoSQL

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Bangalore/Bengaluru - India

Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Data Engineer (Python PySpark Apache Airflow NoSQL)
Location: Bengaluru Karnataka India (Onsite)
Experience: 35 years

Responsibilities:
Build optimize and maintain scalable ETL pipelines for data ingestion and processing.
Develop and manage workflows using Apache Airflow for scheduling and orchestrating tasks.
Work with distributed computing technologies (PySpark) to handle largescale datasets.
Design and implement data architectures that scale with growing business needs.
Implement data lake and data warehousing solutions using both structured and unstructured data.
Collaborate with data scientists and analytics teams to ensure data quality and availability.
Optimize existing data models and pipelines for performance and scalability.
Use NoSQL databases (e.g. MongoDB Cassandra) for large scalable data storage solutions.
Ensure high data integrity security and quality through monitoring and validation processes.
Write clear documentation and maintain data engineering best practices

Skills & Qualifications:
Strong proficiency in Python PySpark and SQL.
Experience working with Apache Airflow for orchestration.
Handson experience with distributed computing and big data tools (PySpark Hadoop).
Familiarity with cloud platforms (AWS GCP) and tools like S3 EMR Lambda etc.
Experience with NoSQL databases (e.g. MongoDB Cassandra) and relational databases.
Strong understanding of data warehousing concepts ETL processes and data lake architecture.
Experience with data pipeline monitoring logging and alerting.
Strong knowledge of Docker and containerized environments.
Familiarity with DevOps and CI/CD practices for data engineering.
Excellent problemsolving communication and teamwork skills.

About the Company:
CuberaTech founded in 2020 is a data company revolutionizing Big Data Analytics through a data value share paradigm where the users entrust their data to us. Our deployment of deep learning techniques enables us to harness this data making us a source of the richest Zero party data. Further by stitching together all the relevant pieces of data from zero first and secondparty sources we enable advertisers to define and create custom audiences to maximize the programmatic ROAS.
Website:

pyspark,ci/cd,nosql,cassandra,sql,devops,mongodb,apache airflow,airflow,python,aws,hadoop,docker,apache,gcp

Employment Type

Full Time

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.