This is where you and your skills come in. Were currently looking for Site Reliability Engineer (DevOps Engineer) for our AI SRE team.
Responsibilities:
- Provide ongoing support for production environments troubleshooting manage and resolve incidents perform RCA facilitate blameless postmortems
- Actively contribute to production stability improvements (processes improvements)
- Do care about monitoring alerting and logging capacity management
- Implement CI/CD related pipelines and automations
- Understand the architecture of our services and products
- Design automated software and product upgrades change management and release management solutions
- Interact with our internal customers mostly Developers/QA and OPS/SRE
- Work with teams responsible for Infrastructure Networking Applications Engineering Information Security
- Continuously improve and share knowledge of system update documentation
- OnCall reliable Rotations and Schedules
Requirements:
- Solid knowledge and strong experience in production support activities
- Understanding of SRE principles and DevOPS practices
- Troubleshooting process understanding and experience
- Experience as a Linux System Administrator at least 23 years
- Key Skills for Kubernetes (K8s) DevOps: understanding of networking security storage is critical
- AWS cloud experience (IAM VPC R53 AZs EC2/EKS RDS S3 CloudFront CloudWatch)
- Understanding of Realtime Data Streaming (Kafka)
- Understanding the Basic Concepts of Elasticsearch (Node Cluster Index Document Shard Replicas)
- RDBMS administration experience (PostgreSQL/AuroraDB preferably) or MySQL
- Experience working with GIT Prometheus Grafana Terraform Helm
- Knowledge of CI/CD tools and ability to implement deployment activities automation
- Familiar with application and service monitoring tools and techniques
- Effective communication skills (Active listening Friendliness Confidence Sharing feedback Respect)
- English Intermediate (B1)
Personal skills:
- Team player
- Fast learner
- Documentation culture
Will be a strong plus:
- System thinking approach
- Expertise automating system administration tasks with configuration management tools
- Real automation experience (Python Bash Golang)
- MS Azure/GCP cloud experience
- Flux Kustomize/Flagger/Strimzi Istio
- Experience with MongoDB
- Experience in ELK stack usage
- Webservice administration experience: Nginx
What we offer:
- Wellcoordinated professional team.
- Cutting edge technologies interesting and challenging tasks dynamic project great opportunities for selfrealization professional and career growth.
- Additional Health and Life Insurance Package.
- Employee Assistance Program.
- 25 vacation days.