This role is for one of the Weekdays clients
We are looking for an experienced MLOps Engineer to join our team and work at the intersection of machine learning and DevOps. This role is ideal for someone who is passionate about deploying monitoring and managing ML models in production environments. You will collaborate closely with data scientists machine learning engineers and DevOps teams to streamline and automate machine learning workflows ensuring scalability reliability and efficiency.
Key Responsibilities:
- ML Model Deployment: Design and implement endtoend machine learning pipelines including model deployment monitoring and maintenance.
- Infrastructure Automation: Develop and maintain infrastructure as code (IaC) to automate the setup of ML environments and CI/CD pipelines for model deployment.
- Monitoring and Maintenance: Monitor model performance data drift and system health ensuring optimal operation of ML systems in production.
- Collaboration: Work closely with data scientists and machine learning engineers to streamline processes and deploy models quickly and effectively.
- Tooling and Frameworks: Implement and integrate MLOps tools for model versioning data management and workflow orchestration.
- Optimization: Identify and resolve bottlenecks in ML workflows to improve efficiency scalability and reliability.
Required Skills and Experience:
- DevOps & Automation: Strong experience with DevOps principles and tools including CI/CD (Jenkins GitLab CI or similar) containerization (Docker) and orchestration (Kubernetes).
- Cloud Platforms: Handson experience with cloud services like AWS GCP or Azure for model deployment and infrastructure management.
- ML Lifecycle Management: Familiarity with MLOps frameworks and tools such as MLflow Kubeflow TFX or SageMaker.
- Programming Skills: Proficiency in Python and/or Java with an understanding of ML libraries and frameworks (TensorFlow PyTorch Scikitlearn).
- Data Management: Knowledge of data versioning data pipelines and data management solutions to support model lifecycle.
- Monitoring and Logging: Experience with monitoring tools (Prometheus Grafana) and logging frameworks to ensure model performance and system reliability.
gcp,sagemaker,ci/cd,tensorflow,scikit-learn,grafana,pytorch,model deployment,prometheus,aws,docker,data versioning,java,mlflow,azure,tfx,devops,infrastructure,mlops,kubernetes,ci,python,machine learning,data management,kubeflow