drjobs Lead Site Reliability Engineer - Remote العربية

Lead Site Reliability Engineer - Remote

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

West - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Lead Site Reliability Engineer

Remote

1. SRE Implementations: Look for candidates who have experience implementing SRE principles including the establishment of Service Level Indicators (SLIs) Service Level Objectives (SLOs) and Error Budgets to ensure system reliability and availability.

2. Observability: Search for keywords related to observability including familiarity with concepts such as fullstack observability and distributed tracing

3. Tool Proficiency: Datadog CloudWatch Synthetic Monitoring tools

4. Building SRE Culture: Evaluate candidates based on their ability to develop SRE frameworks within organizations such as creating SRE charters and fostering a culture of reliability and accountability across teams.

5. Automation: Look for candidates with extensive experience in automation including the automation of repetitive tasks infrastructure provisioning and deployment processes to streamline operations and enhance efficiency.

6. Chaos Engineering: Consider candidates who have experience in Chaos Engineering practices and related tools demonstrating their ability to proactively identify system weaknesses and improve resilience through controlled experiments.

Job Details:

Lead and mentor a team of SREs to ensure operational excellence and maximize the reliability and availability of client systems.

Minimum 10 years of work experience in DevOps/SRE including leadership roles.

Architect and design highly scalable and available infrastructure solutions integrating best practices in reliability engineering and automation.

Collaborate with crossfunctional teams (DevOps Development IT) to implement SRE principles throughout the software development life cycle.

Establish and manage Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical services monitoring and maintaining performance against defined targets.

Implement and enhance observability alerting and incident response processes to proactively address issues and minimize downtime.

Drive continuous improvement initiatives identifying bottlenecks and optimizing within the infrastructure and application stack.

Develop and maintain documentation related to system architecture configuration and procedures.

Stay current with industry trends recommending and adopting new tools and practices to enhance system reliability.

Qualifications:

Strong background in designing and implementing highly available and scalable infrastructure.

Proficiency in scripting and automation using Python or Shell

Experience with container orchestration platforms serverless architectures CI/CD pipelines and IaC implementations. (Ansible & Terraform)

Experience with Observability tools (preferred: Datadog CloudWatch).

Indepth knowledge of cloud computing platforms (preferred: AWS).

Solid understanding of SRE/DevOps principles and practices.

Excellent problemsolving skills with the ability to troubleshoot complex issues in production environments.

Strong communication and leadership skills fostering effective collaboration with crossfunctional teams.

Relevant certifications in SRE DevOps Cloud etc. are a plus

Employment Type

Full Time

Company Industry

Accounting & Auditing

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.