We need a senior (10 years) Site Reliability Engineer/SRE (NOT PURE DEVOPS) with excellent experience working with AWS (Certifications preferred). Candidates must have experience in architecting implementing and managing monitoring tools such as Prometheus/Grafana CloudWatch Splunk NewRelic and ELK in the cloud. Strong Linux OSlevel and commandline/scripting knowledge and configuration management principles as well as Experience with computer provisioning on a Cloud based platform using Terraform and/or Cloud formation.
****** This is not a pure DevOps position. CANDIDATES MUST HAVE THE TITLE OF SRE. PLEASE DO NOT SEND ME CANDIDATES WITH DEVOPS TITLES.
Location: Hybrid NYC/Midtown No Relocation Candidates must be onsite day one and go into the office three times a week.
Total IT experience:
Years working with: SRE
Years working with: AWS
Years working with: Linux
Years working with: Terraform/Cloud
Job Description:
The Platform Infrastructure team at iCapital plays a critical role in keeping the production and development environments running smoothly and securely. This role will utilize advanced cloud capabilities to facilitate the Platform Infrastructure strategy of market agility and lean operating principles with a strict focus on quality to meet the evergrowing demands of our clients. iCapital is seeking a highly collaborative creative and intellectually curious Platform Infrastructure Engineer who is passionate about forming and implementing cuttingedge cloud computing capabilities. This Platform Infrastructure Engineer will wear multiple hats in a highly visible role interacting with all aspects of the business is essential.
Responsibilities
- Build highly available solutions across the entire SDLC stack with primary focus on an internet facing fintech site.
- Develop and maintain tools to support the development environment on MacOS and Linux tool environment with focus on improving developer productivity.
- Maintain site reliability with a focus on building highly scalable systems integrating resiliency and high availability at all levels.
- Develop software and tooling to secure and automate cloud infrastructure building software delivery capabilities with fully automatic workflows.
- Design and operation of a Kubernetes environment for container management and orchestration.
- Participate in oncall rotations to help understand the system while helping build tools for automation.
Qualifications
- 10 years of DevOps TechOps or SRE experience with 5 years of AWS experience
- Microservices (Docker Kubernetes) experience in a production environment strongly desired
- Strong Linux OSlevel and commandline/scripting knowledge and configuration management principles
- Working knowledge of databases such as MongoDB Postgres DynamoDB
- Experience in architecting implementing and managing monitoring tools such as Prometheus/Grafana CloudWatch Splunk NewRelic and ELK in the cloud
- Coding beyond simple scripting with strong opinions on maintainable/reusable code in Python Ruby or Java desired
- Experience with computer provisioning on a Cloud based platform using Terraform and/or Cloud formation
- Experience with distributed systems design maintenance and handson troubleshooting/debugging skills
- Exceptional analytical skills able to apply knowledge and experience in decisionmaking to arrive at creative and commercial solutions
- Experience building a Microservice based architecture
- Excellent written and verbal communication skills
- Experience in updating runbooks tools and documentation that help the team to respond to incidents proactively
- Able to design and implement complex but easily managed automated infrastructure
- A desire to share teach and learn as part of a team
- AWS certifications are a plus