Overview:
TekWissen is a global workforce management provider headquartered in Ann Arbor Michigan that offers strategic talent solutions to our clients worldwide. Our client provider of digital technology and transformation information technology and services
Position: SRE (Fintech)
Location: Atlanta GA 30346
Duration: 7 Months
Job Type: Contract
Work Type: Hybrid
Job Description:
-
We are looking for an experienced Platform Site Reliability Engineer (SRE) with deep expertise in Kubernetes and AWS to help us enhance the performance scalability and reliability of Digital Payment platform.
-
You will play a critical role in ensuring the availability and resilience of our cloudnative services with a focus on automation monitoring and performance optimization.
-
As a Platform SRE with a focus on Kubernetes and AWS you will work with crossfunctional teams to design implement and maintain scalable secure and highperforming infrastructure.
-
You will be responsible for managing Kubernetes clusters automating infrastructure deployments and leveraging AWS services to ensure platform reliability availability and continuous improvement.
Key Responsibilities:
-
Kubernetes Management: Deploy manage and optimize Kubernetes clusters in production and staging environments ensuring high availability and efficient resource utilization.
-
AWS Infrastructure: Leverage AWS cloud services (EC2 S3 RDS EKS Lambda etc.) to build manage and scale cloudnative infrastructure.
-
Automation & Infrastructure as Code: Develop and maintain automated workflows using Infrastructure as Code (IaC) tools like Terraform CloudFormation or Ansible to provision configure and manage cloud infrastructure.
-
CI/CD Pipeline Support: Build optimize and maintain CI/CD pipelines to enable seamless code delivery and deployments using tools like Jenkins GitLab CI or CircleCI.
-
Monitoring & Observability: Implement and maintain monitoring alerting and logging solutions using tools such as Prometheus Grafana CloudWatch or ELK stack to ensure system health and availability.
-
Incident Response: Lead and support incident response efforts conduct root cause analysis and implement postincident reviews to improve system resilience.
-
Performance Optimization: Identify and resolve performance bottlenecks improve system efficiency and ensure applications and infrastructure are optimized for both cost and performance.
-
Security & Compliance: Work with security teams to implement best practices for securing Kubernetes clusters AWS resources and platform infrastructure including access controls network policies and encryption.
-
Collaboration & Documentation: Work closely with development DevOps and infrastructure teams to align on best practices improve automation and document procedures for infrastructure management and troubleshooting.
Required Qualifications:
-
Experience: 3 years of experience as a Site Reliability Engineer DevOps Engineer or in a similar role with handson experience in cloudnative infrastructure.
-
Kubernetes Expertise: Strong expertise in managing and scaling Kubernetes clusters including experience with Kubernetes networking storage and multicluster architectures.
-
AWS Cloud Expertise: Proficiency with AWS services such as EC2 S3 EKS RDS VPC Lambda IAM CloudWatch and others.
-
Experience with AWS best practices for scalability security and cost management.
-
Infrastructure as Code (IaC): Handson experience with IaC tools such as Terraform AWS CloudFormation or Ansible for provisioning and managing cloud infrastructure.
-
CI/CD Pipelines: Experience building and maintaining continuous integration and continuous deployment (CI/CD) pipelines using Jenkins GitLab CI or similar tools.
-
Scripting & Automation: Proficiency in scripting languages such as Python Bash or Go to automate operational tasks and improve workflows.
-
Monitoring & Logging: Experience with monitoring logging and alerting tools like Prometheus Grafana CloudWatch ELK stack or similar tools.
-
Troubleshooting & Incident Management: Ability to troubleshoot complex issues in distributed systems conduct root cause analysis and implement solutions to prevent recurrence.
-
Collaboration Skills: Strong communication skills with the ability to work collaboratively with developers operations and product team
Must Have Skills:
-
8 Yrs of Exp KubernetesAws Cloud
-
8 Yrs of Exp Reliability Engineering
-
8Yrs of Exp Java Microservices
TekWissen Group is an equal opportunity employer supporting workforce diversity.