drjobs Senior DevOps Engineer Kubernetes MLOps LLMOps

Senior DevOps Engineer Kubernetes MLOps LLMOps

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Jobs by Experience drjobs

5years

Job Location drjobs

Ahmedabad - India

Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

This is a remote position.

We are seeking a highly skilled Senior DevOps Engineer with deep expertise in Kubernetes complemented by significant experience in MLOps (Machine Learning Operations) and LLMOps (Large Language Model Operations). This role is ideal for someone who has a strong background in managing and architecting SaaS applications in Kubernetes and is passionate about building and optimizing infrastructure to support machine learning and AIdriven applications.


Responsibilities:

  • The Senior DevOps Engineer will play a critical role in ensuring that our systems are highly available reliable and scalable You will architect build and monitor cloudnative architectures with Kubernetes and related technologies particularly in the context of machine learning and AI workloads.
  • You should have a deep understanding of the Software Development Life Cycle including Continuous Integration and Continuous Deployment (CI/CD) pipeline architecture particularly as it relates to deploying ML models and AI services in Kubernetes environments.
  • You will assist in the design and operation of critical cloud infrastructure on AWS with a focus on supporting the unique requirements of machine learning and AIdriven applications. Examples include model training deployment and scaling. All of these examples would be leveraging AWS SageMaker.
  • Collaborate closely with data scientists and ML engineers to create a streamlined automated build and deployment process for ML models and LLMs in Kubernetes.
  • Implement and manage the infrastructure necessary for the continuous integration delivery and monitoring of ML models and AI services ensuring they are seamlessly integrated into our SaaS applications.
  • Ensure the availability and performance of production systems that run MLdriven services proactively identifying and resolving issues that may impact model performance or availability.
  • Optimize infrastructure for the efficient training deployment and scaling of ML models and LLMs leveraging Kubernetes GPU clusters and cloudnative tools including AWS SageMaker.
  • Develop and maintain monitoring and alerting solutions tailored to ML and AI workloads ensuring that both the infrastructure and deployed models are performing as expected.
  • Troubleshoot and resolve production incidents ensuring minimal downtime and quick recovery.
  • Participate in oncall rotation as necessary.
  • Ensure the security and compliance of our production systems and data with a particular focus on protecting sensitive AI and ML data.
  • Mentor and coach junior DevOps engineers.


Requirements


  • Bachelors degree in Computer Science Engineering or a related field.
  • A minimum of 7 years of experience in maintaining optimal performance of online production environments utilizing bare metal cloud and container technologies.
  • At least 4 years of experience managing production Kubernetes infrastructure with exposure to cloud vendor Kubernetes solutions such as EKS AKS and GKE.
  • Strong experience with Docker for containerization including creating and managing Docker images and containers.
  • Strong experience in architecting and managing SaaS applications in Kubernetes with specific experience in MLOps and LLMOps.
  • Deep understanding of the machine learning lifecycle including model training deployment monitoring and scaling particularly using AWS SageMaker.
  • Experience with MLOps tools and frameworks such as Kubeflow MLflow or similar and their integration into Kubernetes environments.
  • Familiarity with LLMOps including the deployment and management of LLMs in production environments. Solid experience in scripting languages such as Python.
  • Experience with Infrastructure deployment and automation tools such as Terraform CloudFormation etc.
  • Working knowledge of industrystandard build tooling and CI/CD using GitHub & Github Actions.
  • Expertise in monitoring and logging solutions such as Prometheus and Grafana.
  • Good understanding of networking and security concepts.
  • Strong knowledge of Linux systems and shell scripting.
  • Strong communication and collaboration skills with experience working closely with data scientists and ML engineers.
  • Experience working in an agile environment and understanding of agile methodologies.
  • Certifications such as CKA (Certified Kubernetes Administrator) or CKAD (Certified Kubernetes Application Developer) are a plus

    Nice to Haves:

  • Experience with workflow orchestration tools like Apache Airflow particularly for managing complex data pipelines and ML workflows.
  • Experience with GitOps tools such as ArgoCD for managing Kubernetes deployments through versioncontrolled repositories.
  • Familiarity with GPU acceleration technologies and their integration with Kubernetes for optimizing ML model training and inference.
  • Knowledge of data versioning tools and frameworks like DVC (Data Version Control) in the context of MLOps.
  • Experience with cloud cost optimization strategies particularly in environments running intensive ML and AI workloads.

Technologies we use:

  • We use numerous AWS services and are expanding into Azure.
  • AWS SageMaker is central to machine learning model training deployment and management processes.
  • Terraform CloudFormation Ansible Kubernetes are leveraged for our infrastructure deployment and automation.
  • Industrystandard build tooling and CI/CD using GitHub ArgoCD.
  • A mix of opensource and proprietary technologies that are tailored to the problems at hand.


Benefits

  • Work from home.
  • 5 days a week work shift.


Bachelor's degree in Computer Science, Engineering, or a related field. A minimum of 7 years of experience in maintaining optimal performance of online production environments, utilizing bare metal, cloud, and container technologies. At least 4 years of experience managing production Kubernetes infrastructure, with exposure to cloud vendor Kubernetes solutions such as EKS, AKS, and GKE. Strong experience with Docker for containerization, including creating and managing Docker images and containers. Strong experience in architecting and managing SaaS applications in Kubernetes, with specific experience in MLOps and LLMOps. Deep understanding of the machine learning lifecycle, including model training, deployment, monitoring, and scaling, particularly using AWS SageMaker. Experience with MLOps tools and frameworks, such as Kubeflow, MLflow or similar, and their integration into Kubernetes environments. Familiarity with LLMOps, including the deployment and management of LLMs in production environments. - Solid experience in scripting languages such as Python. Experience with Infrastructure deployment and automation tools such as Terraform, CloudFormation, etc. Working knowledge of industry-standard build tooling and CI/CD using GitHub & Github Actions. Expertise in monitoring and logging solutions such as Prometheus and Grafana. Good understanding of networking and security concepts. Strong knowledge of Linux systems and shell scripting. Strong communication and collaboration skills, with experience working closely with data scientists and ML engineers. Experience working in an agile environment and understanding of agile methodologies. Certifications such as CKA (Certified Kubernetes Administrator) or CKAD (Certified Kubernetes Application Developer) are a plus

Education

Bachelor's degree in Computer Science or a related field is preferred.

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.