This role is for one of the Weekdays clients
We are seeking an experienced Machine Learning Engineer with a strong background in data engineering machine learning and working with unstructured data. The ideal candidate will have extensive experience in applying advanced machine learning techniques including working with Large Language Models (LLMs) such as GPT and Gemini and building RetrievalAugmented Generation (RAG) systems. If youre passionate about deriving valuable insights from complex datasets and have a deep understanding of machine learning we want to hear from you.
Key Responsibilities:
- Develop and implement machine learning models especially for extracting insights from unstructured data.
- Work with Large Language Models (LLMs) such as GPT and Gemini to finetune and optimize outputs.
- Design and build RetrievalAugmented Generation (RAG) systems to enhance information retrieval from large datasets.
- Extract information from unstructured data sources like PDFs using tools such as PDFMiner PyMuPDF or PDFPlumber.
- Apply Optical Character Recognition (OCR) techniques for document processing.
- Utilize Vector Databases LangChain and Lama Index for efficient data storage and retrieval.
- Process large volumes of unstructured data particularly in insurance legal or healthcare sectors.
- Collaborate with data scientists and engineers to enhance model performance and improve outcomes.
- Optimize data workflows using Azure Cloud services.
- Keep uptodate with advancements in machine learning and deep learning especially in NLP computer vision and generative AI models.
Requirements:
- 5 years of handson experience in machine learning and data engineering.
- Extensive knowledge of data science machine learning techniques and data processing.
- Experience in working with unstructured data specifically from PDFs.
- Proficient in Large Language Models (LLMs) like GPT and Gemini with knowledge of prompt engineering and finetuning.
- Proven experience in building and implementing RetrievalAugmented Generation (RAG) systems.
- Familiarity with Vector Databases Azure Cloud LangChain Lama Index and OCR tools.
- Handson experience with PDFtotext extraction tools such as PDFMiner PyMuPDF or PDFPlumber.
- Strong background in machine learning deep learning computer vision NLP and generative AI models.
- Good to have: Working knowledge of Python and libraries like NumPy pandas scikitlearn TensorFlow and PyTorch.
Preferred Qualifications:
- BE/MS/PhD in Computer Science Data Science Machine Learning or related fields.
retrieval-augmented generation (rag) systems,machine learning,azure cloud services,pdfplumber,pymupdf,gemini,scikit-learn,pdfminer,unstructured data,langchain,pandas,computer vision,vector databases,generative ai models,lama index,optical character recognition (ocr) techniques,numpy,nlp,python,pytorch,gpt,large language models (llms),tensorflow,data engineering