Designation: Site Reliability Engineering
Location: Rtp Research Tri, North-Carolina or San Jose CA
Responsibilities include
- Participate in Agile Scrum
- Accountability for system reliability right from design to end of life.
- Automate operational capabilities using Python, Ansible, Terraform, Go etc.
- Deliver automation through CI/CD pipeline and chatbot etc.
- Rotation in on call duties like diagnosis, fix and RCA for incidents and support escalations.
- Improvement and responsibility for monitoring and alerting
Technical Expertise:
- Experience in managing Enterprise Grade Kubernetes cluster (RH OpenShift preferred) & Anthos.
- Experience working with Public Cloud offerings (AWS, GCP)
- Advanced knowledge of Kubernetes, Dockers, Terraform, Ansible, Jenkins, GitOps, Git, Linux
- Software development lifecycle includes design, development, testing, packaging, deployment using Python or Golang
- Preferred: Kubernetes, OpenShift experience or certification
Non-Technical Requirements
- Agile software development practices, JIRA
- Work with geographically distributed teams
- Understand IT processes, including architecture, design, implementation, and operations
- Self-motivated, able, and willing to help where help is needed
- Able to build and establish relationships, be culturally sensitive, have goal alignment and learning agility
- Have a knack for always finding automation opportunities within the team