drjobs AIML Data Engineer العربية

AIML Data Engineer

Employer Active

drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Job Description

Title: AI/ML Data Engineer

Location: Remote

Duration: 1 year

We are currently in the process of migrating our Clinical Trial Management System (CTMS) and are seeking a skilled AI/ML Data Engineer to join our team. As an AI/ML Data Engineer you will be responsible for developing and managing a new OCR (Optical Character Recognition) and document classification model. Our existing model which processes around 200 studies and 75000 documents annually has seen a significant decline in accuracy. You will play a crucial role in rebuilding this model to adapt to new document templates and structures ensuring the accuracy and efficiency of our document processing workflow.

Key Responsibilities:

  • Develop and Implement New OCR and Classification Models:
    • Rebuild the OCR and document classification model to achieve high accuracy and reduce manual intervention.
    • Utilize NLP techniques to enhance the models performance.
    • Ensure the model can effectively classify documents as valid or invalid and flag those that require manual review.
  • Data Analysis and Model Evaluation:
    • Analyze existing data and model performance to identify areas of improvement.
    • Conduct rigorous testing and validation of the new model to ensure it meets the required accuracy standards.
  • Collaboration and Integration:
    • Work closely with the data engineering team to integrate the new model into the CTMS migration.
    • Collaborate with crossfunctional teams to understand document structures and ensure the model aligns with business needs.
  • Continuous Improvement:
    • Monitor the models performance postdeployment and make necessary adjustments to maintain high accuracy.
    • Stay updated with the latest advancements in OCR NLP and machine learning to continually improve the model.

Qualifications:

  • Educational Background:
    • Bachelors or Masters degree in Computer Science Data Science Machine Learning or a related field.
  • Experience:
    • Proven experience in developing and deploying OCR and NLP models.
    • Experience with machine learning frameworks and libraries.
    • Prior experience in handling large volumes of documents and ensuring data quality.
  • Technical Skills:
    • Handson development with Microsoft stack (Azure Data Factory SSIS Databricks) as well as Python and SQL.
    • Proficiency in Python and relevant libraries.
    • Strong understanding of data preprocessing feature extraction and model evaluation.
    • Experience with database management and integration.
  • Soft Skills:
    • Ability to work independently and collaboratively in a fastpaced environment
    • Excellent problemsolving skills and attention to detail.
    • Strong communication skills to collaborate with team members and stakeholders.
    • Hyper communication and inquisitive.
    • Drivers with high level of selfmotivation.
    • Extreme accountability and ownership.
    • Handson executionists vs theorists.
    • Critical thinking.
  • Experience in the healthcare or clinical research industry.
  • Familiarity with CTMS and document management systems.
  • Knowledge of cloud platforms and DevOps practices.
  • Experience with Validated Systems

Additional Details

Recruiters Here are the most recent updated notes from manager regarding the role and what he is looking for:

  • Current OCR and classification exists in Python using OCR/NLP
    • Accuracy was once as high as 98% now below 60% requiring more manual intervention
  • Existing process covers 200 studies and around 75k documents per year
  • Need data scientist to build new OCR and Classification model as part of CTMS migration
    • Classifying as valid or not
    • Does it look like others If not manually review.
  • Document templates and structures have changed since original so need to rebuild
  • Primary responsibility to rebuild and manage the module
    • Data engineering work being done by other members of the team but will need to work with them

Employment Type

Remote

Company Industry

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.