Job Title: Site Reliability Manager
Location: Plano TX or Richmond VA (Hybrid)
Eligibility: Must be eligible to convert to fulltime
(rate can go high for Ex capital one contractor)
Only GC/USC/H4EAD
- Interview Process: Two Rounds with Engagement Manager and second round is panel interview.
- Technical Assessments Required: Technical questions and possible coding challenge.
Key Terms on Resume: Python AWS (EC2 DynamoDB) Git/GitHub Jenkins Docker Site Reliability Engineer NodeJS Angular React JS
Tech Must Haves
- 5 years as Site Reliability Engineer
- Python is preferred language.
- Programming Language: NodeJS React JS and Angular is preferred but not required and can be taught.
- AWS for Cloud Development (EC2 DynamoDB)
- Familiarity utilizing REST APIs
- Solid understanding of SRE principles including proactive monitoring and selfhealing system design.
- Proficient in common DevOps tools (Git/GitHub Jenkins Docker).
Overview:
Capital One is seeking a Site Reliability Manager with a strong background in cloudbased solutions (preferably AWS) and a passion for driving automation selfhealing systems and leveraging Site Reliability Engineering (SRE) principles. You will provide technical leadership to ensure the stability scalability and performance of our applications identifying opportunities for automation and proactive monitoring solutions.
Key Responsibilities:
- Cloud Expertise: Deep understanding of cloudbased solutions and services with a focus on AWS (EC2 DynamoDB).
- Automation & Scripting: Lead automation efforts by implementing scripting machine learning and selfhealing systems.
- DevOps Best Practices: Provide technical leadership around DevOps tools (Git/GitHub Jenkins Docker) and best practices.
- Production Support: Ensure systems are highly reliable with experience in production support and monitoring tools (Splunk New Relic).
- Technology Stack: Proficiency in Python NodeJS (NR Synthetics) ReactJS Java and API integration using REST.
- Monitoring & Alerting: Develop and implement automated monitoring and alerting solutions to minimize manual interventions.
- ZeroTouch Automation: Identify opportunities to reduce manual validation and promote zerotouch automation and selfhealing systems.
Requirements:
- Experience with AWS cloud services (EC2 DynamoDB).
- Proficient in common DevOps tools (Git/GitHub Jenkins Docker).
- Strong skills in Python NodeJS ReactJS and Java.
- Familiarity with production support processes and REST APIs.
- Solid understanding of SRE principles including proactive monitoring and selfhealing system design.