Job Title: Principal Infrastructure Resilience Engineer
Location: Pune / Bengaluru
Experience: 10 Years
Notice Period: 015 days
Mandatory Skills:
- Strong expertise in datacenter technologies including servers networks and related components.
- Handson experience with Ansible including developing playbooks roles and configurations.
- Proficiency in Infrastructure as Code (IaC) using Terraform to create and manage onpremise datacenter resources.
- Familiarity with CI/CD pipelines including writing YAML pipelines and using GitHub Actions.
What Youll Do:
- Datacenter Leadership: Design implement and optimize resilient datacenter infrastructure ensuring high availability and robust performance.
- Automation & IaC: Develop Ansible playbooks and Terraform configurations to automate infrastructure provisioning and management.
- CI/CD Integration: Create and maintain YAML pipelines and implement workflows using GitHub Actions to streamline delivery processes.
- System Resilience: Lead initiatives to enhance system robustness including chaos engineering disaster recovery planning and incident resolution.
- Collaboration & Documentation: Work closely with crossfunctional teams to align on resiliency goals and produce architecture documentation and operational reports.
- Incident Management: Support incident resolution perform root cause analysis (RCA) and implement corrective measures to prevent recurrence.
What You Should Have:
- Strong technical expertise in datacenter technologies automation and IaC tools.
- Demonstrated experience with Ansible and Terraform for provisioning and configuration management.
- Knowledge of CI/CD workflows including YAML pipelines and GitHub Actions.
- Excellent problemsolving skills ability to work independently and strong collaboration abilities.
- A proactive attitude with a focus on operational excellence and resilience engineering
Good to Have:
- Experience with ServiceNow for CI discovery incident management and architecture documentation.
- Familiarity with monitoring tools and observability practices to ensure realtime system performance insights.
- Exposure to Agile environments and working on hybrid cloud infrastructures.