drjobs ML Engineer

ML Engineer

Employer Active

drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Alexander City - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Job Description

Onsite Local Candidate will be the best.
Remote OPEN for Super Profile.
Mini 10 Yrs ML OPS experience.
Key Responsibilities:
Work with clients AI/ML Platform Enablement team within the eCommerce Analytics team. The broader team is currently on a transformation path and this role will be instrumental in enabling the broader teams vision.
Work closely with data scientists to help with production models and maintain them in production.
Deploy and configure Kubernetes components for production cluster including API Gateway Ingress Model Serving Logging Monitoring Cron Jobs etc. Improve the model deployment process for MLE for faster builds and simplified workflows
Be a technical leader on various projects across platforms and a handson contributor of the entire platforms architecture
System administration security compliance and internal tech audits
Responsible for leading operational excellence initiatives in the AI/ML space which includes efficient use of resources identifying optimization opportunities forecasting capacity etc.
Design and implement different flavors of architecture to deliver better system performance and resiliency.
Develop capability requirements and transition plan for the next generation of AI/ML enablement technology tools and processes to enable client to efficiently improve performance with scale.

Tools/Skills (handson experience is must):
Administering Kubernetes. Ability to create maintain scale and debug production Kubernetes clusters as a Kubernetes administrator and Indepth knowledge of Docker.
Ability to transform designs ground up and lead innovation in system design
Deep understanding of data center architectures networking storage solutions and scale system performance
Have worked on at least one Kubernetes cloud offering (EKS/GKE/AKS) or onprem Kubernetes (native Kubernetes Gravity MetalK8s)
Programming experience in Python Node Golang or bash
Ability to use observability tools (Splunk Prometheus and Grafana ) to look at logs and metrics to diagnose issues within the system.
Experience with Seldon core MLFlow Istio Jaeger Ambassador Triton PyTorch Tensorflow/TFserving is a plus.
Experience with distributed computing and deep learning technologies such as Apache MXNet CUDA cuDNN TensorRT
Experience hardening a productionlevel Kubernetes environment (memory/CPU/GPU limits node taints annotations/labels etc.)
Experience with Kubernetes cluster networking and Linux host networking
Experience scaling infrastructure to support highthroughput dataintensive applications
Background with automation and monitoring platforms MLOps and configuration management platforms
Education & Experience:
5 years relevant experience in roles with responsibility over data platforms and data operations dealing with large volumes of data in cloud based distributed computing environments.
Graduate degree preferred in a quantitative discipline (e.g. computer engineering computer science economics math operations research).
Proven ability to solve enterprise level data operations problems at scale which require crossfunctional collaboration for solution development implementation and adoption.

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.