drjobs Site Reliability Engineer 1

Site Reliability Engineer 1

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Jobs by Experience drjobs

5years

Job Location drjobs

Bengaluru - India

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

We are looking for a skilled Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will ensure the reliability scalability and performance of our systems while implementing DevOps best practices. You will work on infrastructure automation incident management and observability to maintain a stable production environment.

Key Responsibilities:
Infrastructure & Operations: Manage administer and optimize our Linuxbased infrastructure ensuring high availability scalability and performance for missioncritical applications.
Incident Management: Lead and manage incident response troubleshoot complex issues across production systems and ensure timely resolution based on SLA SLO and SLI.
Linux Administration: Perform indepth system administration tasks including user management system tuning performance optimization patching and log analysis for Linux servers.
Automation & Configuration Management: Automate repetitive tasks manage Docker containers and implement configuration management and provisioning using Terraform and other automation tools.
Monitoring & Performance Tuning: Utilize Grafana Prometheus and other monitoring tools to track system health and performance ensuring proactive measures to maintain reliability and reduce downtime.
Runbook & Documentation: Develop and maintain runbooks detailed troubleshooting guides and operational documentation for incident response and system recovery.
Debugging & Troubleshooting: Perform advanced debugging and rootcause analysis to diagnose complex system issues with a focus on minimizing system downtime and improving operational stability.
Collaboration: Work closely with crossfunctional teams including development QA and operations to ensure reliability and performance standards for new features and releases.
Capacity Planning & Optimization: Plan for future infrastructure needs scale systems
as required and optimize resource utilization to meet growing demands.

Skills and Qualifications:
5 years of experience with Strong Linux Administration with Networking including deep expertise in system configuration user management kernel tuning log analysis and performance troubleshooting.
Handson experience with Docker and container orchestration (e.g. Kubernetes).
Solid knowledge of Terraform for infrastructure automation provisioning and management.
Proficiency with monitoring tools like Grafana Prometheus and Opsgenie to track performance and uptime.
Experience in incident management ensuring that issues are resolved promptly while maintaining SLA SLO and SLI metrics.
Expertise in debugging and resolving complex technical issues in distributed systems with a focus on minimizing downtime.
Proven ability to write and maintain runbooks and operational procedures for troubleshooting and system recovery.
Experience in data center management and ensuring 24/7 availability of production infrastructure.
Strong understanding of automation tools (e.g. Terraform Ansible) and continuous improvement practices.
Excellent communication and teamwork skills with the ability to collaborate effectively across departments.
Linux networking is primary Ubuntu based infra. Strong linux skills required. For eg: Hardening
Ansible for configuration management jenkins Gitlabs for deploying infra.
Automation of tasks Shell scripting and Python.
ELK for logging Prometheus & Grafana
Databricks & Tableau are also being used so SQL skills are required.
Docker containers understanding is a must. Preferred Qualifications:
Experience with cloud platforms (AWS Azure GCP).
Familiarity with CI/CD pipelines and tools like Jenkins Git or similar.
Exposure to Kubernetes and container orchestration technologies.
Certifications in Linux administration SRE DevOps or cloud technologies.



Employment Type

Full Time

Company Industry

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.