Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailNot Disclosed
Salary Not Disclosed
1 Vacancy
Opening / Selling Statement We are seeking a Senior DevOps Engineer with a strong Site Reliability Engineering (SRE) focus to support Jeppesens transition of Crew Management Applications to a webbased SaaS model hosted on AWS. This role is critical in ensuring the success of this transition by bridging development and customer support maintaining system reliability and implementing automation and monitoring solutions.
Required Skills DevOps Cloud infrastructure Kubernetes
Job Duties Deliverables Alignment: Develop solutions in line with key deliverables including metrics collection dashboards reliability audits and runbooks.
Liaison Role: Act as a primary interface between the development team in Sweden and the USbased customer support team.
Automation and CI/CD: Build and optimize CI/CD pipelines and scripts to automate generation testing deployment and monitoring of customized builds.
Observability: Implement and refine monitoring solutions using OpenTelemetry and Grafana for enhanced visibility into system performance.
Reliability Audits: Conduct reliability audits for existing deployments document findings rank issues by criticality and address concerns through merge requests or escalations.
Production Support: Provide 24/7 Tier II production support on a rotational basis handling escalations and minimizing downtime.
Training and Documentation: Prepare technical training and documentation including runbooks playbooks and onboarding materials for Tier I and Tier II support teams.
Dashboards and Metrics: Develop Grafana dashboards for approximately 5070 services including Kubernetes platform and internal services.
Issue Resolution: Investigate and resolve issues reported from lowertier teams ensuring timely resolution and continuous improvement.
Game Day Scenarios: Collaborate with teams to plan and execute Game Day scenarios simulating and preparing for likely system failures.
Collaboration: Work closely with crossfunctional teams to enhance operational efficiency and contribute to system and application improvements.
Job Requirements Experience: 8 years in DevOps SRE or similar roles with a focus on cloudhosted microservicesbased environments.
Technologies: Expertise in Kubernetes AWS EKS Terraform ArgoCD OpenTelemetry and Grafana.
DevOps Practices: Strong knowledge of CI/CD infrastructureascode (IaC) and automation frameworks.
Observability: Proven experience in implementing observability tools and frameworks for metrics collection and system monitoring.
Incident Management: Background in production support troubleshooting and resolving critical system issues.
Documentation: Strong technical writing skills for creating incident runbooks playbooks and support materials.
OnCall Readiness: Willingness to participate in 24/7 rotational production support including incident escalation and resolution.
Desired Skills & Experience Experience conducting reliability audits and implementing scalable solutions.
Familiarity with GitOps practices and tools like GitLab.
Proficiency in building automated remediation for alerts and contributing to infrastructure reliability enhancements.
Background in supporting SaaS transitions particularly in customerfacing and revenuegenerating environments.
Required Skills : DevOps
Basic Qualification :
Additional Skills :
This is a high PRIORITY requisition. This is a PROACTIVE requisition
Background Check : No
Drug Screen : No
Full Time