Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailNot Disclosed
Salary Not Disclosed
1 Vacancy
As a Site Reliability Engineer (SRE) you will be responsible for ensuring the reliability scalability and performance of the systems necessary for the product and services for the Data Engineering Projects.
You will work closely with function developers Architects and DevOps teams to build and maintain highavailability systems capable of handling high workloads automate with active monitoring of the infrastructure.
As SRE you would ensure system reliability availability for continuous deployment as part of the Agile practices in solution development. Mandatory Skills & experience in:
Experience with cloud platforms specifically Azure.
Hands on experience and proficiency in Cloud infrastructure and CI/CD frameworks for providing IaC Terraform ARM YAML and cloud native containerization & deployment of Services viz. Docker k8s etc
Handson experience with large scale Azure DevOps and Azure PaaS components.
Must have tool knowledge Argo Terraform (CLI) AzureCLI KubeCtl Flux Helm Argo (Events and workflows) Istio Grafana Kustomize YAML based coding and debugging skills
Must have Kubernetes admin skill set good to have knowledge about tools/extension to Kubernetes
Experience in understanding of function development of data science solutions & programming languages e.g. Python Go
Excellent problemsolving skills and attention to detail.
Handson experience with architecting and development of features using uService application principles
Deep understanding of Service Level Objectives (SLOs) Service Level Indicators (SLIs) error budgeting and configuring KPIs for highly sophisticated services.
Experience with the ELK stack (Elasticsearch Logstash Kibana) and Prometheus for monitoring and logging.
Solid expertise in applying cloud security best practices through DevSecOps principles with a deep understanding of Kubernetes (k8s) security. Preferred Skills & experience in:
Experience with DevOps data pipelines and various messaging systems on a Cloud native setup (MS Azure)
Experience with database technologies (MongoDB NoSQL etc.) and cloud native optimization services
Strong working knowledge in Azure
Motivating attitude profound communication strong interpersonal skills structured and analytical
Knowledge of costing optimization techniques for large scale cloud native services. Key Responsibilities:
System Reliability: Design and engineer highly scalable and high availability systems for high throughput workloads.
Continuous monitoring & active alerting: Develop deploy and manage monitoring systems setting up alerts to proactively identify and resolve issues.
Automation: Automate routine tasks such as deployments monitoring and policy enforcements using suitable frameworks
Performance Tuning: Optimize system performance by identifying bottlenecks and implementing appropriate solutions.
Infrastructure as Code (IaC): Utilize tools like Terraform Ansible or similar to manage infrastructure through code ensuring consistency and repeatability.
Security: Understand the implement the security policy and enforcements defined by the organization for infrastructure and data
Scaling & Cost Management: Analyze system performance and plan for future scaling needs.
Issue Handling and resolution: Respond to system outages perform root cause analysis and implement fixes to prevent future incidents.
Qualifications :
Masters degree/ Bachelor Degree in Computer Science or Information Science or equivalent engineering stream.
Additional Information :
68 Years of hands on experience in maintaining Large scale High availability Data engineering solutions services.
Remote Work :
No
Employment Type :
Fulltime
Full-time