- Design develop and maintain a generic ingestion framework capable of processing various types of data (structured semistructured unstructured) from customer sources.
- Implement and optimize ETL (Extract Transform Load) pipelines to ensure data integrity quality and reliability as it flows into the centralized datastore like Elasticsearch.
- Ensure the ingestion framework is scalable secure efficient and capable of handling large volumes of data in realtime or batch processes.
- Continuously monitor and enhance the data ingestion process to improve performance reduce latency and handle new data sources and formats.
- Develop automated testing and monitoring tools to ensure the framework operates smoothly and can quickly adapt to changes in data sources or requirements.
- Provide documentation support and training to other team members and stakeholders on using the ingestion framework.
- Implement largescale near realtime streaming data processing pipelines.
- Design support and continuously enhance the project code base continuous integration pipeline etc.
- Build analytics tools that utilize the data pipeline to provide actionable insights into key business performance metrics.
- Perform POCs and evaluate different technologies and continue to improve the overall architecture.
Qualifications :
- Experience building and optimizing Big Data data pipelines architectures and data sets.
- Strong proficiency in Elasticsearch its architecture and optimal querying of data.
- Strong analytic skills related to working with unstructured datasets.
- Experience supporting and working with crossfunctional teams in a dynamic environment.
- Working knowledge of message queuing stream processing and highly scalable big data data systems.
- One plus years of experience contributing to the architecture and design (architecture design patterns reliability and scaling) of new and current systems.
- Candidates must have 4 to 6 years of experience in a Data Engineer role with Bachelors or Masters (preferred) in Computer Science or Information Systems or equivalent field. Candidate should have knowledge of using following technologies/tools:
- Experience working on Big Data processing systems like Hadoop Spark Spark Streaming or Flink Streaming.
- Experience with SQL systems like Snowflake or Redshift
- Direct handson experience in two or more of these integration technologies; Java/Python React Golang SQL NoSQL (Mongo) Restful API.
- Versed in Agile APIs Microservices Containerization etc.
- Experience with CI/CD pipeline running on GitHub Jenkins Docker EKS.
- Knowledge of at least one distributed datastores like MongoDb DynamoDB HBase.
- Experience using batch scheduling frameworks like Airflow (preferred) Luigi Azkaban etc is a plus.
- Experience with AWS cloud services: EC2 S3 DynamoDB Elasticsearch
Additional Information :
We believe that coming together as a community in person is important for innovation connection and fostering a sense of belonging. Our roles have the right balance of remote and inoffice working to enable flexibility for managing your life along with ensuring a real connection with your colleagues and the broader IFS community. #LiHybrid
Remote Work :
Yes
Employment Type :
Fulltime