Overview
As a Senior DevOps Engineer specializing in Prometheus/Grafana you will play a crucial role in designing implementing and maintaining DevOps processes tools and infrastructure. You will be responsible for ensuring the reliability scalability and performance of the systems through effective monitoring and automation.
Key Responsibilities
- Design implement and manage comprehensive monitoring solutions to ensure high availability performance of our microservices infrastructure and applications.
- Utilize advanced monitoring tools and scripting to automate the monitoring of our cloud environments focusing on AWS.
- Develop and maintain robust logging and alerting mechanisms to identify and mitigate potential issues proactively.
- Collaborate with infra team to integrate monitoring solutions into the CI/CD pipeline ensuring seamless deployments and operations.
- Conduct performance analysis capacity planning and scalability testing to ensure our systems meet current and future demands.
- Lead incident response and troubleshooting efforts utilizing monitoring data to quickly resolve operational issues.
Required Qualifications
- Minimum of 5 years of handson experience with Kubernetes Elasticsearch Promtheus Grafana and AWS with a strong emphasis on monitoring and observability in cloudnative environments.
- Proficiency in promgraming languages (such as Python Go or Rust) for automation of monitoring tasks.
- Experience with infrastructure as code (IaC) tools and strong understanding of CI/CD principles including experience with Docker and Kubernetes for container orchestration.
- Deep knowledge monitoring tools (such as Prometheus Grafana or ELK stack) and strategies for largescale environments.
- Proven track record in managing and troubleshooting largescale distributed systems with an emphasis on performance tuning and optimization.
- Excellent problemsolving skills with a focus on delivering highquality reliable and scalable infrastructure solutions.
- Strong communication and teamwork skills with the ability to work effectively in a fastpaced collaborative environment.
teamwork,elk stack,communication,automation,problem-solving,python,iac,troubleshooting,grafana,cd,ci/cd,scripting,aws,monitoring,prometheus,go,devops,rust,elasticsearch,docker,kubernetes,performance tuning,ci