5 years of experience different flavours of Linux like SLES RHEL and Ubuntu/Debian.
Experience in managing HPC clusters and should have good understanding of its architecture.
Skilled in installation and configuration of various applications on Linux.
Install administer and maintain hardware system software networking accounts and security measures on VMWare configuration.
Diagnose and resolve system issues and performance issues.
Should have experience in drafting technical SOPs action plans and knowledge documents.
Should have good understanding of different cloud platforms.
Reinstate integrity of system as quickly as possible following an outage in order to minimize downtime.
Triage and solve usersubmitted tickets especially when they relate to the infrastructure.
Track resource usage using monitoring and queuing software.
Peer assistance is an added trait.
Technical Skills:
Demonstrated expertise with Linux system administration including OS networking storage Docker and security.
Experience with highspeed networking such as InfiniBand and 10/40 Gigabit Ethernet.
Familiarity with large storage systems (Lustre GPFS others).
Experience with HPC clusters manager (xCat HPCM Bright Cluster Manager).
Experience in server hardware patching and troubleshooting.
Experience managing HPC clusters and GPUs.
Experience using and supporting job schedulers such as SLURM PBS or other schedulers.
Familiar with Shell/python scripting and Ansible.
Familiar with monitoring tools like Grafana/Nagios/Opsramp.
Familiar with virtulization technologies like KVM VMWare vCenter.