Title: Data Scientist
Location: Remote
Duration: 1 year
Responsibilities:
Improve the genomics LLMs for gene expression profiling by incorporating new sequencing data types adopting new foundation models and applying new finetuning techniques in LLMs. Develop the Generative AI tool for automated genome editing design based on the improved LLM. Collecting and analyzing large complex data sets using statistical analysis machine learning and other data science techniques. Developing and implementing predictive models and algorithms. Communicating insights and findings to nontechnical stakeholders. Collaborating with business analysts product managers and other stakeholders to identify business opportunities and inform product development. Building and maintaining data pipelines and ETL processes. Ensuring data quality consistency and accuracy. Developing and maintaining documentation and processes for data governance and security.
Qualifications:
- Proficiency with Python pyTorch Linux Docker Kubernetes Jupyter.
- Expertise in Deep Learning Transformers Natural Language Processing Large Language Models
- Bachelors degree in Computer Science Statistics Mathematics or related field.
- Strong programming skills in languages such as Python or R.
- Knowledge of statistical analysis and machine learning techniques.
- Experience with data visualization tools and techniques.
- Strong problemsolving and analytical skills.
- Excellent communication and collaboration skills.
Preferred: Experience with genomics data molecular genetics. Distributed computing tools like Ray Dask Spark.
Additional Details
MUST have demonstrated expertise in training and evaluating transformers such as BERT and its derivatives.
Nextgeneration Artificial Intelligence for Genomics will use more complex datatypes and be applied to new crop contexts.