drjobs Site Reliability Engineer Remote

Site Reliability Engineer Remote

Employer Active

drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Alexander City - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Job Description

Description

We are seeking a highly skilled Site Reliability Engineer (SRE) to join our dynamic team dedicated to maintaining and enhancing the reliability performance and availability of our services. As an SRE you will bridge the gap between development and operations optimizing our infrastructure and ensuring our systems run smoothly and efficiently. Your expertise will play a critical role in designing and implementing scalable systems monitoring their performance and responding quickly to incidents. You will collaborate closely with software developers to enable faster more reliable releases while maintaining a high level of service quality. Proactive problemsolving and a deep understanding of industry best practices will be essential as you work on improving the overall customer experience through effective system design and management. The ideal candidate will have a strong technical background paired with excellent communication skills enabling them to work seamlessly across teams and drive initiatives that enhance operational excellence. If you are passionate about building reliable systems and thrive in a fastpaced environment we would love to hear from you.

Responsibilities
  • Design implement and improve the reliability and performance of our services.
  • Develop and maintain monitoring alerting and incident management tools to ensure system health.
  • Collaborate with software engineering teams to create and deploy reliable services and features.
  • Manage production systems with a focus on achieving high availability and uptime.
  • Troubleshoot and resolve complex production issues providing root cause analysis and recommendations.
  • Optimize system performance through rigorous testing performance tuning and incident response.
  • Continuously evaluate and integrate new technologies to improve scalability and reliability of systems.
Requirements
  • Bachelors degree in Computer Science Engineering or a related field.
  • Proven experience in Site Reliability Engineering or a similar role in a production environment.
  • Strong understanding of cloud platforms (AWS Azure GCP) and container orchestration systems (Kubernetes Docker).
  • Proficiency in scripting and programming languages such as Python Go or Java.
  • Experience with monitoring tools like Prometheus Grafana or similar software solutions.
  • Familiarity with CI/CD pipelines and automation/configuration management tools (Ansible Terraform).
  • Strong analytical and troubleshooting skills with a passion for improving system performance.

Employment Type

Full Time

Company Industry

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.