Lead Site Reliability Engineer - Remote العربية

Lead Site Reliability Engineer - Remote

Purple Drive

Posted on : 11-06-2024

Employer Active

1 Vacancy

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Send me jobs like this

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Job Location

West - USA

Monthly Salary

Not Disclosed

Salary Not Disclosed

Vacancy

1 Vacancy

Posted on : 11-06-2024

Job Description

Lead Site Reliability Engineer

Remote

1. SRE Implementations: Look for candidates who have experience implementing SRE principles including the establishment of Service Level Indicators (SLIs) Service Level Objectives (SLOs) and Error Budgets to ensure system reliability and availability.

2. Observability: Search for keywords related to observability including familiarity with concepts such as fullstack observability and distributed tracing

3. Tool Proficiency: Datadog CloudWatch Synthetic Monitoring tools

4. Building SRE Culture: Evaluate candidates based on their ability to develop SRE frameworks within organizations such as creating SRE charters and fostering a culture of reliability and accountability across teams.

5. Automation: Look for candidates with extensive experience in automation including the automation of repetitive tasks infrastructure provisioning and deployment processes to streamline operations and enhance efficiency.

6. Chaos Engineering: Consider candidates who have experience in Chaos Engineering practices and related tools demonstrating their ability to proactively identify system weaknesses and improve resilience through controlled experiments.

Job Details:

Lead and mentor a team of SREs to ensure operational excellence and maximize the reliability and availability of client systems.

Minimum 10 years of work experience in DevOps/SRE including leadership roles.

Architect and design highly scalable and available infrastructure solutions integrating best practices in reliability engineering and automation.

Collaborate with crossfunctional teams (DevOps Development IT) to implement SRE principles throughout the software development life cycle.

Establish and manage Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical services monitoring and maintaining performance against defined targets.

Implement and enhance observability alerting and incident response processes to proactively address issues and minimize downtime.

Drive continuous improvement initiatives identifying bottlenecks and optimizing within the infrastructure and application stack.

Develop and maintain documentation related to system architecture configuration and procedures.

Stay current with industry trends recommending and adopting new tools and practices to enhance system reliability.

Qualifications:

Strong background in designing and implementing highly available and scalable infrastructure.

Proficiency in scripting and automation using Python or Shell

Experience with container orchestration platforms serverless architectures CI/CD pipelines and IaC implementations. (Ansible & Terraform)

Experience with Observability tools (preferred: Datadog CloudWatch).

Indepth knowledge of cloud computing platforms (preferred: AWS).

Solid understanding of SRE/DevOps principles and practices.

Excellent problemsolving skills with the ability to troubleshoot complex issues in production environments.

Strong communication and leadership skills fostering effective collaboration with crossfunctional teams.

Relevant certifications in SRE DevOps Cloud etc. are a plus

Employment Type

Full Time

Report This Job

Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.

Free AI Resume Review

Get Hired 3x Faster with free, confidential review from Ai resume review service.

Order Now

Resume, LinkedIn, Cover Letter

Elevate your professional profile with expertly crafted documents including your resume, LinkedIn profile, cover letter.

Start Now

Dr.Job AutoApply

3X your job search with AutoApply's AI for faster dream job results.

Learn More

Reverse Recruiting

Never apply for a job again. We apply and track jobs for you to find your perfect match.

Lead Site Reliability Engineer - Remote

Purple Drive

Job Description

Employment Type

Company Industry

Key Skills

About Company

Similar Jobs

Field Service Engineer Los Angeles California

Lead Web3 Integration Engineer

Lead Software Engineer Java Ba...

Java lead

Technical lead

Data Lead

Senior Category Manager - Foundations and Site Investigations

Cosmos Configuration Lead