Job Description:
Role: Senior ETL and Feature Engineer
Job Location: Bangalore
Experience: 49 years
Educational Qualification: Engineering & preferably from a premier institute
Notice Period: Immediate joiner
About the Job:
In this company you will have the opportunity to lead this era with Privacy Enhancing & Responsible AI Technologies. You will be an individual contributor working in setting up the big data ecosystem of the world s first privacy red teaming and blue teaming platform. You will be working on cutting edge privacy platform requirements with customers across the
globe and industry verticals. As part of being one of the early employees of the organization you will be given a significant ESOP option and will be working with some of the brilliant minds from IISc and IIMs.
Responsibilities:
1. Develop and maintain ETL pipelines to ingest and process largescale datasets
2. Develop a Python connector designed for ETL applications enabling the collection of historical samples and the generation of intelligent samples for AI/ML tasks
3. Demonstrate handson experience on Apache Kafka Amazon Kinesis and AWS Glue
4. ETL pipeline for AI/ML workload integrations and executions
5. Implement and maintain ETL pipeline and deep understanding of orchestration scaling and resource management.
6. Develop ETL samples to manage unstructured data tasks covering data preprocessing for various types including emails Office 365 documents pictures voice recordings videos and PDFs.
7. Establish a solid understanding and practical knowledge of SQL databases with a focus on query performance optimization and effective index management.
8. Implement an ETL pipeline with multiple databases to extract samples from both NoSQL and SQL
9. Work seamlessly in a multicloud environment specifically AWS and Azure.
10. Execute handson development and deployment of microservices and containerized applications.
Requirements:
1. Minimum 4 years handson experience in setting up ETL & feature engineering part of the data pipelines on cloud or big data ecosystems.
2. Strong handson experience in Apache Kafka Amazon Kinesis and AWS Glue
3. Holds expertlevel knowledge in at least one ETL tool (ETL/ Workflow Sea Tunnel/ Dolphin Scheduler SSIS Informatica Talend etc.)
4. Experience with big data technologies such as Hadoop Spark and Hive
5. Experience in NoSQL and SQL databases (e.g. MongoDB SnowFlake PostgreSQL DeltaLake Parquet).
6. Strong programming skills in Python and experience with data manipulation libraries such as pandas numpy etc.
7. Handling unstructured data tasks including preprocessing for various data types such as Mail Office 365 Pictures Voice Video and PDF
8. Possesses SQL knowledge including query performance tuning index maintenance and an understanding of database structure
9. Good to have knowledge in Apache Spark Apache NiFi Apache Kafka Apache Airflow Apache Beam KubeFlow Apache Beam and TensorFlow Transform
10. Expertise in networking security and cloud platforms (AWS Azure GCP etc.)
etl,aws,sql,big data,databases,apache,apache kafka,cloud,etl tools,nosql,python,panda,numpy