- Experienced L3 SRE engineer based on businesscritical SaaS application.
- Capacity to L3 across the full stack including infra backend and frontend before escalation to engineering business unit.
- Capacity to automate SRE tools to provide proactive.
- L3 support close to our tech monitoring strategy.
- Capacity to work under business pressure for business critical applications.
- Capacity to communicate accordingly with L1L2 Engineering Product managers leadership and endusers during troubleshooting.
- Capacity to communicate accordingly.
- Experience with incident and problem management.
- Experience with multitenant applications.
- Solid understanding of networking concepts(TCP/IP DNS Routing etc) like VPCs subnets firewalls and load balancing TLS and SSL.
- Experience with CI/CD pipelines (e.g. Jenkins Github Actions) & version control.
- Python react/next.
- Monitoring and logging to analyze & track resource utilization application performance and identify potential issues Grafana Prometheus Loki or ELK.
- Experience with AWS particularly EKS serverless queue & various databases.
- Solid knowledge Kubernetes.
Qualifications :
Must have Skills: EKS Github Actions Python (Strong) Kubernetes (Expert) Prometheus.
Good to Have Skills:
- Previous experience building a userfacing GenAI/LLM software application.
- Security best practices in cloud environments. AWS Managed Services (RDS Batch Lambda Fargate Step Functions SQS/SNS etc.).
- FastAPI and NextJS experience (if were still using the latter).
- Websockets ServerSide Events Pub/Sub (RabbitMQ Kafka etc.).
- Cloud security concepts (IAM access control).
- Terraform experience.
Remote Work :
Yes
Employment Type :
Fulltime