Job Title: Site Reliability Engineer (US Citizens Only)
Location: Remote
We are currently seeking candidates who meet the following qualifications:
Responsibilities:
- Design implement and maintain scalable and reliable infrastructure.
- Automate deployment monitoring and scaling processes.
- Develop tools and scripts to improve system observability and operational efficiency.
- Monitor system performance identify bottlenecks and optimize performance.
- Collaborate with development teams to ensure best practices for reliability and scalability are implemented.
- Troubleshoot and resolve incidents ensuring minimal impact on service availability.
- Implement and manage incident response processes including root cause analysis and postmortems.
- Stay current with the latest technologies and trends in site reliability engineering.
Qualifications:
- Bachelors degree in Computer Science Information Technology or a related field.
- Experience in site reliability engineering DevOps or a related field.
- Strong knowledge of cloud platforms (e.g. AWS Azure Google Cloud).
- Proficiency in automation and scripting languages (e.g. Python Bash Go).
- Experience with containerization and orchestration tools (e.g. Docker Kubernetes).
- Experience with monitoring and logging tools (e.g. Prometheus Grafana ELK Stack).
- Excellent troubleshooting and problemsolving skills.
- Strong communication and collaboration skills.
- Federal Experience is a plus.
- Required Security clearance.
If you meet these qualifications please submit your application via link provided in Linkedin.
Kindly do not call the general line to submit your application.