Do you love a career where you Experience Grow & Contribute at the same time while earning at least 10% above the market If so we are excited to have bumped onto you.
If you are a Pyspark Developer Position looking for excitement challenge and stability in your work then you would be glad to come across this page.
We are an IT Solutions Integrator/Consulting Firm helping our clients hire the right professional for an exciting long term project. Here are a few details.
Check if you are up for maximizing your earning/growth potential leveraging our Disruptive Talent Solution.
Role:Pyspark Developer
Location: Mumbai
Exp: 5 8 Years
Requirements
We are seeking an experienced PySpark Developer to join our data engineering team. The ideal candidate will have expertise in Apache Spark and Python programming focusing on building scalable highperformance data processing pipelines. As a PySpark Developer you will collaborate with crossfunctional teams to design build and deploy big data solutions that drive business insights and analytics.
Key Responsibilities:
- Develop test and maintain largescale data processing systems using PySpark and other big data technologies.
- Design and implement data pipelines to extract transform and load data from various data sources ensuring scalability and reliability.
- Work with data scientists data analysts and other stakeholders to understand requirements and provide solutions that meet business needs.
- Optimize and tune PySpark applications for efficient performance including memory management processing time and resource utilization.
- Integrate data from multiple sources managing schema data transformations and data quality.
- Participate in code reviews design discussions and performance tuning sessions to ensure highquality deliverables.
- Document processes data flow and other key technical aspects of the developed solutions.
Required Skills and Experience:
- Bachelor s degree in Computer Science Engineering or a related field (or equivalent experience).
- 2 years of handson experience in PySpark Spark SQL and Spark DataFrames.
- Proficiency in Python programming with experience in data manipulation and processing.
- Strong knowledge of Apache Spark architecture and experience working with large datasets in a distributed environment.
- Experience with SQL and relational databases including query optimization.
- Familiarity with ETL frameworks and data processing tools (e.g. Hadoop Hive Kafka).
- Experience in cloud platforms such as AWS Azure or Google Cloud and their respective big data services.
- Understanding of data lake and data warehouse concepts and best practices.
- Knowledge of data partitioning caching and other optimization techniques in Spark.
- Strong problemsolving and debugging skills with attention to detail and accuracy.
Preferred Qualifications:
- Experience with data orchestration tools (e.g. Apache Airflow).
- Familiarity with NoSQL databases (e.g. Cassandra MongoDB).
- Knowledge of DevOps practices including CI/CD and containerization tools (e.g. Docker Kubernetes).
- Experience with machine learning frameworks integrated with Spark (e.g. MLlib).
Soft Skills:
- Excellent communication skills and ability to work collaboratively in a team environment.
- Strong analytical skills with an ability to translate complex business requirements into scalable technical solutions.
- Ability to manage multiple projects prioritize tasks and adapt in a fastpaced environment.
Benefits
Bachelor s degree in Computer Science, Engineering, or a related field (or equivalent experience). 2+ years of hands-on experience in PySpark, Spark SQL, and Spark DataFrames. Proficiency in Python programming, with experience in data manipulation and processing. Strong knowledge of Apache Spark architecture and experience working with large datasets in a distributed environment. Experience with SQL and relational databases, including query optimization. Familiarity with ETL frameworks and data processing tools (e.g., Hadoop, Hive, Kafka). Experience in cloud platforms such as AWS, Azure, or Google Cloud and their respective big data services. Understanding of data lake and data warehouse concepts and best practices. Knowledge of data partitioning, caching, and other optimization techniques in Spark. Strong problem-solving and debugging skills, with attention to detail and accuracy.