drjobs Senior HPC Cluster Engineer العربية

Senior HPC Cluster Engineer

Employer Active

1 Vacancy
The job posting is outdated and position may be filled
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Amsterdam - Netherlands

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

The company

Nebius AI is an AIcentric public cloud platform specifically crafted to serve AI models for training and inference.

Our mission is to help ML practitioners concentrate on their core jobs while DevOps MLOps and infrastructurerelated tasks are handled by us. The idea is to build an MLspecific cloud platform covering the entire ML lifecycle from A to Z: from data preparation and labeling to ML training and inference.

We recognize the potential of ML and AI technologies and aim to provide our future users with the perfect environment to train and finetune their models. We are committed to delivering the best user experience and excellent customer support.

Four development hubs:
Nebius is headquartered in the Netherlands with hubs in Finland Serbia and Israel.

Data center in Europe:
Our own data center in Finland features server racks designed inhouse for MLspecific high load with powerefficient solutions including a freecooling system.

500 professionals:
Our mature team of engineers has a proven track record in developing sophisticated cloud and ML solutions and designing cuttingedge hardware.


The role

Were looking for a Senior HPC Cluster Engineer to contribute to the development of our hyperscaler platform.

The Hypervisor team supports and develops the parts of the Cloud platform that directly affect the KVM hypervisor and QEMU device emulator. We understand the granular details of hardware virtualization and device emulation paying close attention to performance and protection against untrusted code.


In this position your responsibility will be to:

  • Improve infrastructure around GPUaccelerated computing
  • Analyze root cause and suggest corrective action for problems large and small scales
  • Add new hardware support through all infrastructure software stack
  • Detect and fix problems before they occur

We expect you to have:

  • 5 years of professional software development experience
  • 3 years of experience with Linux
  • Fluency in Go programming language
  • General understanding of QEMU/KVM virtualization stack

It would be an added bonus if you had:

  • System level understanding of server architecture PCIe devices NICs Linux OS and kernel drivers
  • Experience analyzing and tuning performance for a variety of HPC workloads
  • Familiarity with RDMA RoCE Infiniband
  • Background with Software Defined Networking and HPC cluster networking
  • Familuarity with deep learning frameworks like PyTorch and TensorFlow


Does all that sound like your kind of challenge Then join us!



Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.