Overview
The Senior Site Reliability Engineer plays a crucial role in ensuring the reliability performance and scalability of our systems and applications. The individual will be responsible for designing and implementing robust and scalable solutions to enhance system availability and performance. This role is vital in enabling the organization to meet its service level objectives and deliver a seamless user experience.
Key responsibilities
- Engage in and improve the whole lifecycle of service from inception and design through to deployment operation and refinement
- Develop and maintain tools redesigning capacity planning infrastructure for greater scalability
- Troubleshooting diagnosing fixing software issues and ensuring data security
- Define architecture improvements and push for changes that improve reliability
Required qualifications
- Have source code understanding of opensource data groups such as HDFS HBase YARN Spark Flink Airflow Kyuubi ZK Kafka etc.
- Have used at least one automation component tool: Ansible Terraform etc.
- Indepth understanding of Linux and computer networks
- Experience in at least one language (Python/Golang/Java etc.)
- Experience in the public cloud (AWS GCP Azure etc.) management and use is preferred
- Minimum of 5 years of handson experience on backend or big data ecosystem
scripting,go (golang),automation,python,java,aws/gcp/azure,big data ecosystem,public cloud management,ansible,aws,python/golang/java,backend,automation component tool,reliability,linux,computer networks