Job Title: Site Reliability Engineer (SRE) Kubernetes
Location: Austin Texas (Onsite)
Experience: 5 years
Job Summary:
We are seeking a highly skilled Site Reliability Engineer (SRE) with expertise in Kubernetes to join our team in Austin Texas. The ideal candidate will be responsible for maintaining and improving the reliability scalability and performance of our cloud infrastructure. You will collaborate with development and operations teams to implement best practices for automation monitoring and incident response.
Key Responsibilities:
- Design deploy and manage Kubernetes clusters in production environments.
- Automate infrastructure provisioning deployment and monitoring using tools like Terraform Helm and Ansible.
- Ensure high availability security and reliability of distributed systems running on AWS GCP or Azure.
- Implement and maintain CI/CD pipelines to streamline deployments and reduce manual intervention.
- Develop observability solutions using Prometheus Grafana ELK Stack or Datadog.
- Improve incident management processes conduct postmortems and implement corrective actions to prevent recurrence.
- Optimize cloud resource utilization and implement cost management strategies.
- Work closely with development teams to enhance application performance and enable DevOps culture.
- Ensure security and compliance by implementing RBAC network policies and secrets management in Kubernetes.
- Troubleshoot and resolve performance bottlenecks network issues and infrastructure failures.
Requirements
- 5 years of experience as an SRE DevOps Engineer or Kubernetes Engineer.
- Handson experience with Kubernetes administration deployment strategies (Helm Kustomize) and service meshes (Istio Linkerd).
- Strong knowledge of containerization technologies like Docker.
- Experience with Infrastructure as Code (IaC) using Terraform Pulumi or CloudFormation.
- Proficiency in scripting and automation (Python Bash Go or PowerShell).
- Experience with observability & monitoring tools (Prometheus Grafana ELK Stack Datadog New Relic).
- Strong understanding of networking DNS Load Balancers (NGINX Traefik HAProxy).
- Experience managing cloud infrastructure in AWS GCP or Azure.
- Working knowledge of CI/CD tools such as Jenkins GitHub Actions ArgoCD or Flux.
- Familiarity with security best practices including RBAC IAM SSO and certificate management.
- Experience with disaster recovery and incident response processes.
Preferred Skills:
Experience with multicluster Kubernetes deployments.
Familiarity with serverless architectures (AWS Lambda Google Cloud Functions).
Experience working in FinTech Healthcare or SaaS environments.
Knowledge of database administration for PostgreSQL MySQL or NoSQL databases.
Certifications such as CKA (Certified Kubernetes Administrator) AWS Certified DevOps Engineer or Google Professional Cloud DevOps Engineer.
Technical Skills:
Kubernetes Docker Terraform Helm Ansible AWS GCP Azure Prometheus Grafana ELK Stack Datadog Jenkins GitHub Actions ArgoCD Flux Python Bash Go PowerShell Istio Linkerd NGINX Traefik HAProxy RBAC IAM SSO CloudFormation Pulumi PostgreSQL MySQL NoSQL serverless architectures.
Benefits
Why Join Us
- Competitive salary and benefits package.
- Opportunity to work with cuttingedge cloudnative technologies.
- Collaborative and dynamic work environment.
- Career growth and learning opportunities with access to training and certifications.
- Flexible work schedule with remote/hybrid options.
How to Apply:
If you are passionate about Kubernetes cloud infrastructure and automation and looking for an exciting opportunity in Austin Texas apply now by sending your resume to
Note: Sponsorship is not available for this role. Candidates must be authorized to work in the U.S. without employer sponsorship.