Job Title : DevOps/SRE (System Reliability Enigneer) Location : Cupertino, CA - Remote Experience : 10-16 yrs Exp - L1 level Key Qualifications At least 5+ years in a Site Reliability Engineering or DevOps focused role Experience in Ansible Experience in scripting languages such as Python and Bash Experience in implementing and coordinating telemetry via monitoring tools such as Splunk/Grafana/Prometheus at various levels (API, runtime, infrastructure, log analysis, etc) Experience in container and container orchestration technologies such as Kubernetes, EKS, Docker Experience in systems built with open source storage and search technologies such as Cassandra, Postgres, Redis, ElasticSearch Experience with scale testing, disaster recovery, and capacity planning Experience designing, building and maintaining infrastructure with a cloud provider such as AWS Strong Linux system administration and networking knowledge. Strong sense of ownership. At the same time you re a great teammate who communicates clearly and transparently Self motivated, inquisitive and always looking to learn more Description As an SRE/DevOps for the Reliability Software team, you will: Be challenged with high level problem statements and be expected to take ownership and drive solutions Implement solutions that operate at scale to improve the reliability of the team's data warehouse platform Develop, operate, monitor, and automate team infrastructure tools and services, both on-prem and in AWS Pioneer and implement monitoring tools for a complete telemetry system Actively participate in capacity planning, scale testing and disaster recovery exercises