Desired Competencies (Technical/Behavioral Competency)
MustHave Bigdata Hadoop Development experience
GoodtoHave Good communication skills
SN Responsibility of / Expectations from the Role
1 Development of end to end ETL Pipelines using Pyspark on Hadoop
2 Gathering data requirements from Product Owner
3 Data aggregation and complex logic building
4 Leading a 3 person team from offshore of Business Intelligence Area
5 Good understanding of Hadoop data warehousing concepts
Key Responsibilities
- Design and implement data processing systems using Pyspark and Hadoop.
- Develop and maintain ETL processes for data ingestion and transformation.
- Optimize data storage and retrieval processes in Hadoop.
- Collaborate with data scientists to understand data requirements and deliver accurate data sets.
- Write efficient Pyspark applications for handling large datasets.
- Conduct data quality assessments and resolve data issues.
- Monitor and troubleshoot performance issues within the Hadoop ecosystem.
- Create and maintain documentation for data workflows and processes.
- Participate in code reviews and contribute to best practices in data engineering.
- Develop data models and maintain validation frameworks.
- Implement data security and governance standards across data pipelines.
- Work with crossfunctional teams to align data strategies with business objectives.
- Stay updated with the latest industry trends and technologies related to big data.
- Train and mentor junior developers in Pyspark and Hadoop technologies.
- Assist in the deployment of productionready data applications.
Required Qualifications
- Bachelors degree in Computer Science Information Technology or a related field.
- Minimum of 3 years of experience working with Hadoop and Pyspark.
- Strong programming skills in Python and experience with Pyspark.
- Experience with Hadoop components such as HDFS Hive and Pig.
- Familiarity with distributed computing concepts and technologies.
- Practical knowledge of SQL and NoSQL databases.
- Understanding of data warehousing concepts and ETL processes.
- Ability to work with complex datasets and form data analysis routines.
- Strong analytical and problemsolving skills.
- Experience with version control systems like Git.
- Proficient in data visualization tools and techniques.
- Excellent communication and teamwork skills.
- Experience with cloud platforms (AWS Azure etc.) is a plus.
- Ability to adapt to new technologies and tools quickly.
- Certifications in big data technologies are favorable.
hadoop ecosystem,pig,pyspark,sql proficiency,nosql,hdfs,sql,data modeling,data warehousing,git,cloud platforms,etl development,hive,python,hadoop,data analysis,data visualization