Title: DevOps/Site Reliability Engineer
Contract: 3month Contract could extend
Schedule: 100% Remote
Responsibilities:
Design and Infrastructure Development
- Design and implement highly available scalable and faulttolerant infrastructure.
- Collaborate with engineering teams to define and implement reliability standards and best practices.
- Automation and Operations
- Automate infrastructure provisioning configuration and deployment processes to streamline operations.
- Collaborate with other software engineers to design and implement deployment strategies using automated continuous integration and continuous delivery pipelines.
- Monitoring and Performance Management
- Monitor system performance and identify potential issues to ensure uptime and optimal performance.
- Collaborate with software engineering teams to improve system reliability through automated testing fault tolerance and disaster recovery planning.
- Incident Management
- Lead incident management efforts overseeing response processes and coordinating with crossfunctional teams.
- Design and implement incident response playbooks and escalation procedures for timely and effective resolution.
- Conduct postincident reviews to identify root causes and implement preventative measures.
Experience
- 5 years of experience as a Site Reliability Engineer or in a similar role.
- 4 years of experience performing engineering and support in Azure.
- 4 years of experience supporting enterpriselevel complex applications and platforms in production.
- 3 years of designing and building complex observability solutions using industrystandard tools or custombuilt solutions.
- 2 years of building and supporting CI/CD in GitHub
Technical Skills
- Proficiency in programming languages such as Python Go Java or C# with a focus on automation and scripting.
- Strong proficiency in Infrastructure as Code (IaC) principles using tools like Terraform.
- Experience with container orchestration tools such as Kubernetes and containerization technologies like Docker.
- 3 years of experience working with configuration and monitoring technologies such as Ansible Grafana Elastic Splunk and Prometheus.
Desired Qualifications
- A degree in computer science or engineering
- Experience with Agile Scrum (Daily Standup Sprint Planning and Sprint Retrospective meetings)
- Experience in datastores such as relational databases NoSql and other cloud storage services