Project the aim youll have
Were looking for a skilled Senior SRE Engineer to join a team that works on a complex distributed architecture spanning physical machines and virtualizing onprem host/cloud computing. Our Client develops and deploys systematic financial strategies across a variety of asset classes and global markets and our teams work collaboratively to drive the production of highquality predictive signals and financial strategies the foundation of a sustainable global investment platform.
If you enjoy working with cuttingedge technologies in a fastpaced environment this opportunity is for you!
Qualifications :
Expectations the experience you need
- 5 years of proven experience in SRE
- Deep expertise and handson experience working with Linuxbased systems with a focus on optimization and troubleshooting.
- Strong skills in Python for scripting automation and system management.
- Indepth knowledge of container orchestration technologies such as Kubernetes (K8S). Experience with other cluster management tools like Slurm is a plus.
- Handson experience with tools like Helm Terraform and Ansible to manage infrastructure in a scalable and automated way.
- Strong working knowledge of Docker Podman or other containerization systems to enable efficient and consistent deployment.
- Experience working with CI/CD tools especially GitLab (preferred) GitHub or Git to ensure smooth and rapid delivery cycles.
- Experience with monitoring and logging solutions such as Prometheus Grafana and the ELK stack to provide comprehensive insights into system performance and health.
- Understanding of relational databases their performance tuning and management in distributed systems.
- Familiarity with Agile development methodologies with a focus on continuous improvement and collaboration.
- Exposure to cloud technologies such as AWS or Google Cloud (GCP) is a strong plus.
Position how youll contribute
- Architecture and Automation: Design and deploy AsAService solutions using opensource software to automate system management scaling and monitoring.
- System Optimization: Develop tools to streamline deployment monitoring and incident management for largescale distributed environments.
- Collaboration Across Teams: Work with development and operations teams to design and implement software solutions that enhance the overall reliability of services. Contribute to the ongoing DevOps and Agile transformation.
- Monitoring & Incident Response: Set up configure and maintain monitoring and alerting systems to ensure realtime visibility into system performance. Participate in oncall rotations to respond to incidents and mitigate downtime.
- CI/CD & Infrastructure Management: Continuously improve CI/CD pipelines using tools like GitLab Helm Terraform and Ansible ensuring fast safe and reliable deployments.
- Container Orchestration: Leverage container orchestration platforms like Kubernetes (K8S) to manage distributed systems at scale. Experience with Slurm or similar cluster management is a plus.
- Cloud and Automation Tools: Use cloud infrastructure (AWS GCP etc.) and Infrastructure as Code (IaC) tools to automate the provisioning and scaling of resources.
Our Benefits
- Educational resources.
- Flexible schedule and Work From Anywhere.
- Referral Program.
- Supportive and chill atmosphere.
We are accepting applications from LATAM countries
Position at: Software Mind LATAM
Remote Work :
Yes
Employment Type :
Fulltime