Role: Site Reliability Engineer
Location : Mountain View CA & Bellevue WA
Exp : 10 years
W2 Contract
The Ideal Candidate will have experience with system operations and running largescale massively distributed infrastructure.
Responsibilities:
Data monitoring and alerting data quality assurance and anomaly detection.
Document team processes and policies including methods of engagement and SLOs.
Analyze design and implement solutions at the system level to remove bottlenecks and improve edge service performance.
Implement monitoring and alerting to improve issue detection and response.
Work in a fastpaced environment. Participate in technical operations and rotations in response to performance and reliability issues.
Participate in oncall rotations responsible for resolving or escalating incoming events
Maintain and operate a Linux and Kubernetes environment
Qualifications:
3 years experience working with Unix Linux systems from kernel to shell and beyond with experience working with system libraries file systems and clientserver protocols.
2 years experience coding python scripts for platform operations.
Experience in networking technologies such TCP/IP BGP DNS etc. in a carriergrade environment.
Experience in developing and operating one or more of the following systems: OpenStack Kubernetes Nginx ipvs ELK stack Hadoop etc.
Bachelors degree or above majoring in Computer Science or related fields with at least 2 years of related work experience.
Please be advised that all candidates will be fully screened by HPE and our customer before final selection.
Customer has a hybrid office policy. Contractors are remote but the policy may change at some point in the future. Candidates must be Pacific time and within commuting distance to one of the following cities:
Location 1: Culver City
Culver City CA 90230
Location 2: Mountain View
Mountain View CA 94041
Location 3: Seattle
Bellevue WA 98004
shell scripting,distributed infrastructure,openstack,elk stack,data quality assurance,linux,hadoop,linux systems,system operations,networking technologies,anomaly detection,nginx,python scripting,kubernetes,data monitoring,reliability,alerting