drjobs Staff Engineer- SRE

Staff Engineer- SRE

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Chennai - India

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Responsibilities: 

  • The Site Reliability Engineering (SRE) team is responsible for the reliability   scalability   stability   and   performance   of   systems   and services. 

  • They work with crossfunctional teams to design build and maintain systems and they troubleshoot issues when they arise. They bridge the gap between development and operations teams. 

  • They   work   closely   with   business   teams   to   define   Service   Level Objectives (SLO) and agreements (SLA) of critical systems. They also monitor and maintain the uptime of these systems inline with the defined SLOs and SLAs.

  • They deploy and manage monitoring tools to gain insights on system health and performance.

  • They   analyze   performance   identify   bottlenecks   and   implement solutions to improve a systems scalability and latency durations.

  • They develop scripts implement tools and automation frameworks to reduce the manual intervention efforts of deployment monitoring and scaling.  

  • They work with development teams for design and development of observability practices like logging metrics tracing etc. They aim to diagnose and troubleshoot issues proactively.

  • They create actionable alerts on monitoring systems to ensure rapid response for potential production incidents.

  • They forecast resource needs and provision adequately for current and future demand.

  • They design and execute chaos experiments to test systems failure resiliency.

  • They own define and implement the Disaster Recovery (DR) processes for systems.

  • They also conduct planned and unplanned mock DR drills to test for response preparedness during production incidents.

  • They ensure that security best practices are followed and implemented  during design and operations of systems.

  • They also own and maintain documentation of processes playbooks and systems.

  • They publish KPI reports and other system health updates on a regular basis to the business.

 

 

Requirements:

  • Musthave Bachelors degree preferably in CS or a related field or equivalent experience 

  • Musthave 12 years of overall IT experience

  • Musthave 7 year of proven work experience as a Senior Site Reliability Engineer or a similar position. 

  • Musthave 5 years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc. 

  • Musthave AWS experience 3 years experience with using a broadrange of AWS technologies (e.g. EC2 RDS ELB S3 VPC CloudWatch & Monitoring Tools) to develop and maintain an Amazon AWS based cloud solution with an emphasis on best practice cloud security.

  • Musthave 2 year of experience in CDN and/or Cache systems like Fastly Akamai CloudFront etc.

  • Proven Understanding & strong experience with Cloud deployments ( AWS / Docker/ Kubernetes) 

  • Knowledge on provisioning IAC Tools like Terraform Chef Ansible Shell groovy python etc.

  • Experience with monitoring systems such as CloudWatch NewRelic Datadog/Splunk ELK stack. 

  • Experience managing cloud network resources (AWS Preferred) such as CloudWatch

  • VPC URL proxies private link DNS ACLs firewalls and C2S access points. 

  • Platform or Application Engineering and Operational Knowledge in any of the CI/CD tooling like GitHub Actions Jenkins etc.

  • Experience in other tooling Technologies like JIRA Bitbucket Jenkins Fortify SonarQube Nexus Nexus IQ

  • Experience   with   configuration   automation   tools   like Puppet/Ansible/Chef/Salt 

  • Scripting Skills: Strong scripting (e.g. Bash & Python) and automation skills. 

  • Operating Systems: Windows and Linux system administration. 

  • Problem Solving: Ability to analyze and resolve complex infrastructure resource and application deployment issues 

  • Strong attention to detail. Excellent verbal and written communication skills. Strong documentation skills.

Good To Have:

  • Experience with Terraform/Ansible/Chef/Puppet 

  • Experience with GitHub Actions

  • Experience with CloudFront Fastly

  • Oversees team members performing these functions 

  • Anticipates problems and future technical needs and takes necessary steps to address issues. 

  • Work primarily in server side technologies and comfortable with client side whenever        required 

  • Enthusiastically follow technology trends software engineering best practices and          technologies


 

Perks:

 

  • Day off on the 3rd Friday of every month (one long weekend each month)

  • Monthly Wellness Reimbursement Program to promote health wellbeing

  • Paid paternity and maternity leaves


Qualifications :

  • Musthave Bachelors degree preferably in CS or a related field or equivalent experience 

  • Musthave 12 years of overall IT experience

  • Musthave 5 years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc. 


Remote Work :

Yes


Employment Type :

Fulltime

Employment Type

Remote

Company Industry

Department / Functional Area

Engineering

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.