Job Description:
Junior to MidLevel Software Engineer with handson experience in highperformance computing (HPC) hardware and a willingness to work 100% onsite in a secure environment. The ideal candidate will possess 35 years of Linux administration experience (RHEL Rocky Debian Ubuntu) with a strong skill set in installing updating and administering operating systems on baremetal hardware. The role requires proficiency in managing filesystems networks and system security as well as practical hardware skills including racking systems connecting storage devices and managing firmware updates for a variety of hardware vendors (Dell SM HPE). Basic knowledge of PCIe cards (RAID network GPU) HPC technologies and highspeed interconnect fabrics (Infiniband Omnipath RoCE) is essential. Additionally familiarity with distributed file systems batch schedulers (PBSpro SLURM HTCondor) and cluster management tools is highly desirable. We are looking for someone eager to expand their knowledge of HPC methods work collaboratively with users and thrive in a dynamic fastpaced environment under the guidance of experienced mentors.
Experience: 5 years of relevant professional experience with Bachelors 3 with Masters or none with a PhD.
Education: Bachelors or Masters degree in Software Engineering Computer Science Information Systems or related field
Security: Current / active TS/SCI clearance with CI polygraph or willingness to take one. Background investigation required including a minimum of a criminal and credit check as well as at least three professional references.
Minimum Requirements:
Bachelors degree in computer science or a related technical discipline or the equivalent combination of education technical certifications or training or work experience
Active TS clearance adjudication with the ability to obtain SCI and polygraph
Willing to work 100% onsite in a secure environment
Basic linux skills (35 years admin): Install/update/administer linux (rhel/rocky/debian/ubuntu). Be able to manage baremetal hardware installs (pxe/manual/automated install of OS).
Working skillset including users applications OS packages kernel/OS configuration filesystems (xfs ext4 nfs fuse) networks firewall security/hardening (STIG OSCAP)
Hardware skills (35 years various hardware vendors: Dell/SM/HPE): Hands on experience racking systems connecting network and direct attach storage devices.
Lightsout management of systems local/remote. BMC management/firmware updates/setup. Familiar with storage hardware: NAS SAN direct attach sata/sas/nvme drive
Working knowledge of PCIe cards: Raid/network/GPU/other
Basic knowledge of HPC technologies (experience work of at least 13 years) such as:
parallel/distributed file systems (e.g. Lustre Ceph GPFS GlusterFS BeeGFS)
high speed interconnect fabrics (e.g. Infiniband Omnipath RoCE)
HPC batch schedulers (e.g. PBSpro OpenPBS SLURM Torque HTCondor SGE)
Basic understanding of how HPC environments function
Able to work with users and help them understand how HPC systems function
working knowledge of cluster managers (Bright Cluster Manager now named Base Command Manager; xCAT Warewulf Rocks Scyld)
An Excellent Candidate for this position will meet most of the following experience requirements:
The willingness to gain a better understanding of HPC methods and applications someone who wants to be mentored
Good analytic and problemsolving skills
Strong understanding of data governance and security practices
Experience working in Agile/Scrum environments
Ability to work collaboratively in a team environment
Strong problemsolving skills and attention to detail
Eagerness to learn and adapt to new technologies and methodologies
General personal traits we know will connect well with the team:
Superior communication skills
A positive willing attitude
An ability to think on your feet and solve problems quickly
Adaptability to learn new methodologies and technologies
Comfortable working in an agile team environment
Technology and methodology agnostic but accepts using tools needed for the requirement
Enjoys coaching and teaching