Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailJob title HPC Systems R&D Engineer
KLAs AI Advanced Computing Labs is looking for an extraordinary HPC System R&D Engineer to join its team to develop systemlevel HPC technologies that would form the foundation of nextgeneration clusters used in KLA tools that leverage AI to push the boundaries of process control for conductor manufacturing. The technologies would be developed and demonstrated on onprem clusters that serve as testbeds for nextgeneration KLA tools.
Your Daytoday Roles
Expose limitations in existing solutions based on clusters of CPUs & GPUs to deploy AIbased solutions on onprem & cloud infrastructures at scale.
Develop distributed frameworks and systemlevel solutions that enable scaling out image processing & AI loads from single GPU to multinode clusters with multiple GPUs.
Install benchmark and evaluate prerelease hardware for earlystage evaluation and prototyping by identifying (or developing) relevant workloads.
Your Expected Background
Masters / PhD in Computer Science or related fields; bachelors degree holders with relevant experience and extraordinary trackrecord will also be considered.
Deep understanding of operating systems computer networks and high performance applications
Good mental model of the architecture of a modern distributed systems that is comprised of CPUs GPUs and accelerators.
Experience with deployments of deeplearning frameworks based on TensorFlow and PyTorch on largescale onprem or cloud infrastructures.
Strong background in modern and advanced C concepts
Strong Scripting Skills in Bash Python or similar.
Good communication.
MUSTHAVE
Experience with model development on DL frameworks such as TensorFlow and PyTorch
Experience with building opensource operating systems and software stack on prerelease hardware.
Solid understanding of container infrastructure such as Docker or singularity and Kubernetes.
Active participation in C standards bodies or similar
model development,TensorFlow,PyTorch,Bash,Python,C++,CPU/GPU architecture,Parallel computing programming,Optimization
Full Time