Introduction:
An exciting new client is seeking a Senior Machine Learning Engineer for LLMbased data extraction from documents. This role requires expertise in Python and API interactions (API endpoints are available from the LLMs). The ideal candidate will be focused on designing developing and implementing data extraction systems using advanced Large Language Models (LLMs) such as GPT4. This is an inperson officebased position with locations in Hyderabad and Gurgaon offering a great opportunity to work closely with other team members in a collaborative environment.
Job Responsibilities:
- Design develop and implement data extraction systems using Large Language Models (LLMs) such as GPT4.
- Collaborate with data scientists engineers and product managers to understand data extraction requirements and objectives.
- Finetune and customize LLMs for specific data extraction tasks ensuring high accuracy and efficiency.
- Create and maintain data pipelines for the extraction processing and storage of large datasets.
- Conduct performance testing and optimization of LLMs to enhance data extraction capabilities.
- Develop and document best practices for LLMbased data extraction processes.
- Stay updated with the latest advancements in AI and LLM technologies to continually improve data extraction methodologies.
- Troubleshoot and resolve issues related to data extraction processes and models.
Job Requirements:
- Bachelor s or Master s degree in Computer Science Data Science AI or a related field.
- Proven experience with LLMs and natural language processing (NLP) technologies.
- Proficiency in programming languages such as Python with experience in AI/ML libraries (e.g. TensorFlow PyTorch).
- Strong understanding of data structures algorithms and software engineering principles.
- Experience with data extraction ETL processes and database management.
- Familiarity with cloud computing platforms (e.g. AWS Google Cloud Azure) and containerization technologies (e.g. Docker Kubernetes).
- Excellent problemsolving skills and the ability to work in a fastpaced collaborative environment.
- Strong communication skills both written and verbal to effectively convey technical concepts to nontechnical stakeholders.
Preferred Skills:
- Experience with transformerbased models like GPT4 BERT etc.
- Knowledge of big data technologies (e.g. Hadoop Spark) and data warehousing solutions.
- Understanding of regulatory and compliance requirements related to data handling and privacy.
- Prior experience in developing and deploying machine learning models in production environments.
spark,data,computer science,ai/ml libraries,algorithms,software engineering principles,hadoop,api interactions,python,transformer-based models,etl processes,analytical skills,machine learning models,data science,data analytics,engineering,data extraction,data pipelines,tensorflow,big data technologies,performance testing,problem-solving,cloud computing,data structures,containerization technologies,english communication,artificial intelligence,gpt-4,data warehousing solutions,bert,large language models,pytorch,database management,data analysis,communication skills