drjobs Pyspark Hadoop

Pyspark Hadoop

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Pune - India

Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Desired Competencies (Technical/Behavioral Competency)
MustHave Bigdata Hadoop Development experience
GoodtoHave Good communication skills
SN Responsibility of / Expectations from the Role
1 Development of end to end ETL Pipelines using Pyspark on Hadoop
2 Gathering data requirements from Product Owner
3 Data aggregation and complex logic building
4 Leading a 3 person team from offshore of Business Intelligence Area
5 Good understanding of Hadoop data warehousing concepts

Key Responsibilities

  • Design and implement data processing systems using Pyspark and Hadoop.
  • Develop and maintain ETL processes for data ingestion and transformation.
  • Optimize data storage and retrieval processes in Hadoop.
  • Collaborate with data scientists to understand data requirements and deliver accurate data sets.
  • Write efficient Pyspark applications for handling large datasets.
  • Conduct data quality assessments and resolve data issues.
  • Monitor and troubleshoot performance issues within the Hadoop ecosystem.
  • Create and maintain documentation for data workflows and processes.
  • Participate in code reviews and contribute to best practices in data engineering.
  • Develop data models and maintain validation frameworks.
  • Implement data security and governance standards across data pipelines.
  • Work with crossfunctional teams to align data strategies with business objectives.
  • Stay updated with the latest industry trends and technologies related to big data.
  • Train and mentor junior developers in Pyspark and Hadoop technologies.
  • Assist in the deployment of productionready data applications.

Required Qualifications

  • Bachelors degree in Computer Science Information Technology or a related field.
  • Minimum of 3 years of experience working with Hadoop and Pyspark.
  • Strong programming skills in Python and experience with Pyspark.
  • Experience with Hadoop components such as HDFS Hive and Pig.
  • Familiarity with distributed computing concepts and technologies.
  • Practical knowledge of SQL and NoSQL databases.
  • Understanding of data warehousing concepts and ETL processes.
  • Ability to work with complex datasets and form data analysis routines.
  • Strong analytical and problemsolving skills.
  • Experience with version control systems like Git.
  • Proficient in data visualization tools and techniques.
  • Excellent communication and teamwork skills.
  • Experience with cloud platforms (AWS Azure etc.) is a plus.
  • Ability to adapt to new technologies and tools quickly.
  • Certifications in big data technologies are favorable.

hadoop ecosystem,pig,pyspark,sql proficiency,nosql,hdfs,sql,data modeling,data warehousing,git,cloud platforms,etl development,hive,python,hadoop,data analysis,data visualization

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.