Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailWhat will you be responsible for:
You will lead the SRE and the Cost engineering team. This team is responsible and are the custodians of the four pillars: reliability scalability performance and cost. You will be responsible to ensure the reliability OKRs and drive the projects and initiatives that align to the above well architected pillars
What would your workweek look like :
Define team goals objectives and KPIs to measure OKR and operational excellence.
Lead a highperforming team of SREs
Oversee incident response triaging and resolution to minimize Mean Time to Recovery (MTTR).
Lead postincident reviews and ensure longterm fixes are implemented
Should foster a culture of blamelessness and continuous improvement within their teams encouraging open communication and learning from failures.
Assess the architecture and design gaps in the products and address them by building tools/solutions which can cater to the entire product and platform engineering team
Keeping up with emerging technologies and best practices in SRE and DevOps is essential to stay ahead in the field
Who are we looking for:
1012 years of experience in SRE handling performance architecture and design of applications
Strong understanding of cloud computing networking Linux systems administration containerization (e.g. Docker Kubernetes) and infrastructure as code (e.g. Terraform Ansible)
Understanding of SRE principles including SLOs SLIs SLAs and error budgets.
Experience in managing incident and retrospectives
Experience in cloud cost management cloud architecture
Indepth knowledge of cloud computing platforms (e.g. AWS)
Experience with infrastructure as code (IaC) tools and practices
Experience with monitoring logging & telemetry tools like New Relic Splunk ELK Nagios SolarWinds Prometheus AWS Cloudwatch Datadog Opentelemetry
Expert in designing creating and supporting Automation and Identify opportunities for selfhealing systems automated deployments and other scalable solutions.
Experience in performance engineering and identify opportunities for performance tuning and profiling
Experience in prioritizing and managing technical roadmaps.
Strong skills in stakeholder communication requirements gathering and documentation.
Ability to lead crossfunctional teams and build consensus around reliability goals
Improve operational processes and team practices
Provide technical and people leadership to the Site Reliability Engineering teams by facilitating oneoneone team and performance review meetings.
Problemsolving: Ability to analyze complex systems troubleshoot issues and devise effective solutions
Excellent leadership and communication skills with the ability to inspire and motivate crossfunctional teams.
Experience in dealing with the intricacies of largescale distributed systems and ensuring their reliability and performance.
Additional Information :
At Freshworks we are creating a global workplace that enables everyone to find their true potential purpose and passion irrespective of their background gender race sexual orientation religion and ethnicity. We are committed to providing equal opportunity for all and believe that diversity in the workplace creates a more vibrant richer work environment that advances the goals of our employees communities and the business.
Remote Work :
No
Employment Type :
Fulltime
Full-time