Job title HPC Systems R&D Engineer
Our AI Advanced Computing Labs is looking for an extraordinary HPC System R&D Engineer to join its team to develop systemlevel HPC technologies that would form the foundation of nextgeneration clusters used in our tools that leverage AI to push the boundaries of process control for conductor manufacturing. The technologies would be developed and demonstrated on onprem clusters that serve as testbeds for our nextgeneration tools.
Your Daytoday Roles
- Expose limitations in existing solutions based on clusters of CPUs & GPUs to deploy AIbased solutions on onprem & cloud infrastructures at scale.
- Develop distributed frameworks and systemlevel solutions that enable scaling out image processing & AI loads from single GPU to multinode clusters with multiple GPUs.
- Install benchmark and evaluate prerelease hardware for earlystage evaluation and prototyping by identifying (or developing) relevant workloads.
Your Expected Background
- Masters / PhD in Computer Science or related fields; bachelors degree holders with relevant experience and extraordinary trackrecord will also be considered.
- Deep understanding of operating systems computer networks and high performance applications
- Good mental model of the architecture of a modern distributed systems that is comprised of CPUs GPUs and accelerators.
- Experience with deployments of deeplearning frameworks based on TensorFlow and PyTorch on largescale onprem or cloud infrastructures.
- Strong background in modern and advanced C concepts
- Strong Scripting Skills in Bash Python or similar.
- Good communication.
Things to Make us go Wow!
Experience with model development on DL frameworks such as TensorFlow and PyTorch
- Experience with building opensource operating systems and software stack on prerelease hardware.
- Solid understanding of container infrastructure such as Docker or singularity and Kubernetes.
- Active participation in C standards bodies or similar
database systems,pytorch,computational physics,ai-based solutions,communication skills,deep-learning frameworks,operating systems,c++,bash,nosql,container infrastructure,computer networks,api development,java,image processing,python,azure devops,high performance applications,tensorflow,data structures,kubernetes,docker,sql,distributed frameworks,maintainable code,software development,technology,hpc system r&d,clean code,algorithms,system-level solutions,semiconductors,c#,problem solving,machine learning,efficient code,numerical simulations,data