Job description:
- Work in a cross functional team working with Reliability as Expertise in a product or a product area.
- Apply Reliability engineering practices with support from SRE governance teams.
- Ensure delivery quality and supply KPI reporting.
- Collaborate closely within product teams to ensure predictable operations and minimal disruptions to Production.
- Technical analysis troubleshooting of complex issues/Incidents in production.
- Improve monitoring performance by focusing on preventive measures.
- Work together in a crossfunctional product team to monitor manage and resolve issues of the supported applications
- Continuous improvement on proactive monitoring housekeeping automation to proactively detect and avoid incidents.
- Ensure environment stability and reliability.
- Automate processes impacting development and production leveraging tools and building scripted solutions.
Skills
- Experience in Site Reliability Engineering maintenance & operations and/or development.
- Strong working experience in DevOps practices (CI/CD etc.).
- Experience within solutions architecture and how to fast pinpoint causes of issues.
- Experience from ITIL support processes and ITSM tools (e.g. ServiceNow) in a microservices context.
- Experience in monitoring tools (Splunk Grafana etc.).
- Understanding of performance engineering (Application Reliability).
- Experience in building CI/CD workflows using GitHub Actions.
- Knowledge of Azure DevOps and/or other cloud environments is nice to have.
- Experience in provisioning Infra resources leveraging Infra as Code (Terraform / Ansible).
- A passion for problem solving with strong analytical capabilities.
- Experience working through SRE Metrics such as SLI SLO and Error Budget.
- Experience with managed cloud Kubernetes services (e.g. AKS GKE).
Required cloud certification: Azure AKS
Remote Work :
No