Job Overview:
We are seeking a highly skilled Senior Python & PySpark Developer to join our team. The ideal candidate will have extensive experience with Python development PySpark and working with large datasets in distributed computing environments. You will be responsible for designing implementing and optimizing data pipelines ensuring seamless data processing and contributing to our overall data architecture.
Key Responsibilities:
- Develop maintain and optimize scalable data processing pipelines using Python and PySpark.
- Collaborate with crossfunctional teams to understand business requirements and translate them into technical specifications.
- Work with large datasets to perform data wrangling cleansing and analysis.
- Implement best practices for efficient distributed computing and data processing.
- Optimize existing data pipelines for performance and scalability.
- Conduct code reviews mentor junior developers and contribute to team knowledge sharing.
- Develop and maintain technical documentation.
- Troubleshoot debug and resolve issues related to data processing.
- Collaborate with data engineers data scientists and analysts to deliver highquality solutions.
Requirements
- Bachelors or Master s degree in Computer Science Data Engineering or related field.
- 5 years of experience in Python programming.
- 3 years of handson experience with PySpark and distributed data processing frameworks.
- Strong understanding of big data ecosystems (Hadoop Spark Hive).
- Experience working with cloud platforms like AWS GCP or Azure.
- Proficient with SQL and relational databases.
- Familiarity with ETL processes and data pipelines.
- Strong problemsolving skills with the ability to troubleshoot and optimize code.
- Excellent communication skills and the ability to work in a teamoriented environment.
Preferred Qualifications:
- Experience with Apache Kafka or other realtime data streaming technologies.
- Familiarity with machine learning frameworks (TensorFlow Scikitlearn).
- Experience with Docker Kubernetes or other containerization technologies.
- Knowledge of DevOps tools (CI/CD pipelines Jenkins Git etc.).
- Familiarity with data warehousing solutions such as Redshift or Snowflake.
Benefits
Standard Company Benefits
Pyspark, AWS, Bigdata