Site Reliability Engineers
Band : 2
The role with us
As a Site Reliability Engineer you will have the opportunity to manage the complex challenges of scale which are unique to Telstra’s digitisation while using your expertise in coding algorithms complexity ysis and largescale system design. You will provide scalable reliable durable and secure applications for our customers and internal users. You will help build highly reliable applications using a customerfirst approach while innovating technically. You will understand our customer's needs and how we can meet them.
You will be joining the Telstra Software Engineering Usage Cash & Billing Chapter in Telstra in one of our ICC locations
We're interested in hearing from people who have
- Critical thinking mindset strong sense of accountability for product delivery pion to develop quality software.
- Good communication ss and team player.
- Experience working (or willing to work) with geographically distributed teams.
- Strong technical background
- Develop own and peer’s ss and be a mentor to junior peers to build Tshaped team.
Responsibilities
- Within the Site Reliability Engineering team you will be working with development team and other partner teams to ensure that applications reliability efficiency and performance meets our customer's needs while keeping the service's operation's reliable scalable and automated.
- Develop tools and automation to streamline operations and improve system reliability efficiency and performance.
- Partner with development teams on feature launches to ensure our customers are delivered reliable and scalable functionality.
- Build a deep knowledge on production infrastructure and using that to debug distributed systems problems and identify improvements to the system.
- Operations SLO SLA management
- Metrics reporting and progress tracking.
- Manage infrastructure costs and optimize resource utilization.
- Work with security teams to ensure compliance with security policies and procedures.
- Participate in oncall rotations to provide 24/7 support for our systems.
- Observability (Alarms monitoring synthetics).
- Error management
Qualifications
- Bachelor’s degree in computer science or a related engineering degree
- 8 years of IT industry experience
- Strong Experience in
- Java Springboot Nodejs microservices RDBMS NoSQL
- AWS EC2 S3 Lambda IAM ECS EKS SQS Kinesis
- Observability using Splunk NewRelic
- Infrastructure as Code using terraform.
- APIs and eventdriven approaches
- Security patterns
- Unix/Linux systems administration. Familiar with Docker is a must.
- Strong Experience in ysing and troubleing largescale distributed systems. Quick reaction on high severity customer impacts.
- Ability to debug and optimise code and automate routine tasks.
- Familiarity with containerisation and orchestration technologies such as Docker and Kubernetes.
- Knowledge in modern software engineering practices and tools Agile and DevOps
- Strong communication s and the ability to explain complex technical matters in an easytounderstand way.
Nice to have.
- Knowledge on additional tools