Job Title: Site Reliability Engineer(US Citizens only)
Location: Remote
We are currently seeking candidates who meet the following qualifications:
Responsibilities:
- Maintain and improve the reliability availability and performance of largescale production systems and services.
- Monitor system performance and troubleshoot issues to ensure minimal downtime and optimal performance.
- Design and implement automated monitoring alerting and incident response processes.
- Build and maintain infrastructure as code (IaC) using tools like Terraform CloudFormation or similar.
- Automate manual operational tasks to increase efficiency and reduce errors.
- Collaborate with development teams to ensure systems are designed for scalability resilience and performance.
- Develop tools and frameworks for continuous integration/continuous deployment (CI/CD) pipelines to automate deployment and testing processes.
- Conduct postmortem analysis of production incidents contributing to the improvement of incident response procedures.
- Optimize cloud infrastructure and services focusing on cost management scaling and reliability.
- Ensure security best practices are implemented across production systems.
- Participate in oncall rotations and incident management processes.
Qualifications:
- Proven experience as a Site Reliability Engineer DevOps Engineer or Systems Engineer in largescale production environments.
- Strong knowledge of cloud infrastructure and services (e.g. AWS GCP Azure).
- Experience with containerization (Docker Kubernetes) and orchestration tools.
- Proficiency in at least one programming or scripting language (e.g. Python Go Java Bash Ruby).
- Handson experience with infrastructure automation tools (e.g. Terraform Ansible Puppet Chef).
- Strong understanding of system monitoring logging and alerting tools (e.g. Prometheus Grafana ELK stack Datadog).
- Experience in CI/CD pipelines and tools (e.g. Jenkins GitLab CI CircleCI).
- Experience with performance tuning scaling and troubleshooting distributed systems.
- Knowledge of networking fundamentals and security practices.
- Experience working in an Agile environment and collaborating closely with development teams.
- Federal Experience is a plus.
- Required Security clearance.
If you meet these qualifications please submit your application via link provided in LinkedIn.
Kindly do not call the general line to submit your application