Who Youll Work With
Arista Networks is looking for a skilled professional for our Engineering Productivity team to help maintain and support our rapidly expanding infrastructure and internal user base. The ideal candidate is someone who can wear many hats can be versatile and is enthusiastic about learning new technologies. As a part of the software engineering team you will work with other team members to design build and administer secure scalable and faulttolerant tools and infrastructure in a hybrid cloud environment.
Working in the Engineering Productivity (EngProd) group you will collaborate and work with other engineers to design build scale and operate the systems that the rest of Aristas development teams use. The EngProd team uses industrystandard systems like Ansible Jenkins Kubernetes Grafana Spinnaker MySQL ElasticSearch Google Cloud and Varnish and also internal systems that weve built from the groundup to automate CI/CD testing analysis and visualization.
What Youll Do
- Work with existing k8s admin team to own different aspects of managing a production k8s cluster (eg: upgrades monitoring capacity planning security developer experience etc)
- Proactively monitor respond to and enhance alerts and set up automated alert handling where applicable
- Create and maintain the incident response runbooks working with the service dev teams
- Debug and resolve issues impacting developer user experience and infrastructure stability around the k8s platform
- Adopt current best practices in k8s cluster management. Evaluate and adopt OSS projects that simplify k8s cluster management.
- Set up guidelines and paved paths for service dev teams improving developer experience around the k8s platform.
- Work with Aristas software engineers to identify bottlenecks and limitations in our workflows tooling and infrastructure around k8s and provide fixes for those problems.
- Engage with 3rd party vendor support as part of triage
Qualifications :
- At least BSc Computer Science or Engineering 3 years experience MS Computer Science or Engineering 2 years experience or Ph.D. in Computer Science or equivalent work experience.
- Knowledge of one or more of Go Python Javascript. Experience with shell Scripting to be able to implement medium complexity automation workflows.
- Knowledge of Linux (or UNIX).
- Experience in operating software systems at scale.
- Strong understanding of the fundamentals of storage and networking.
- Comfortable with Ansible and GitOps.
- Strong expertise with managing onprem/baremetal Kubernetes clusters.
- Applied understanding of software engineering principles.
- Strong problem solving and software troubleshooting skills.
- Ability to design a solution and implement features independently. Ability to work in small teams.
- Comfortable with security principles and able to study source code of OSS projects conduct experiments as necessary to debug issues.
- Proven expertise with debugging complex issues that span the technology stack.
- Experience dealing with network proxies and containerized storage.
#LISZ1
Remote Work :
Yes
Employment Type :
Fulltime