Title: DevOps Engineer
Location: Jersey City NJ (Day 1 Onsite)
Duration: 06 Months
Positions General Duties and Tasks
In this role you will be responsible for:
Site Reliability Engineering (SRE):
- Monitor maintain and improve the reliability availability and performance of critical services.
- Develop and implement monitoring solutions (e.g. Prometheus Grafana) to track system health and performance.
- Automate repetitive tasks and improve infrastructure efficiency using tools like Terraform or similar.
- Create and maintain Service Level Objectives (SLOs) Service Level Agreements (SLAs) and Service Level Indicators (SLIs) to drive reliability improvements.
- Participate in oncall rotations to handle incident response root cause analysis and mitigation strategies
Release Management:
- Manage the endtoend software release lifecycle ensuring timely and smooth releases across all environments.
- Work with different teams to coordinate and validate code releases.
- Create and maintain a release calendar in collaboration with product and engineering teams to plan upcoming deployments.
- Troubleshoot issues during the release process and ensure postrelease validation.
- Track and report release metrics to identify improvement opportunities and minimize downtime.
DevOps:
- Design implement and manage CI/CD pipelines (e.g. Jenkins GitLab CI) to support continuous integration and deployment.
- Develop Infrastructure as Code (IaC) practices using tools like Terraform AWS CloudFormation or similar to manage infrastructure environments.
- Collaborate with development teams to create scalable solutions that meet business and technical requirements.
- Support containerization and orchestration efforts (Docker Kubernetes) for application deployments.
- Drive adoption of DevOps best practices across teams fostering a culture of automation and agility.
Requirements for this role include:
- Bachelors degree in computer science Engineering or related field (or equivalent experience).
- 5 years of experience in SRE DevOps or related roles.
- Strong knowledge of cloud platforms (AWS GCP Azure) and cloudnative infrastructure.
- Experience with monitoring and observability tools (e.g. Prometheus Grafana Datadog ELK Stack).
- Proficiency in scripting languages (Python Bash Go) and automation tools (Ansible Puppet Terraform).
- Handson experience with CI/CD tools (e.g. Jenkins GitLab CircleCI).
- Experience with release management managing production deployments and ensuring stable releases.
- Familiarity with containerization technologies (Docker) and orchestration tools (Kubernetes).
- Strong problemsolving skills attention to detail and ability to work in a fastpaced environment.
- Knowledge of GitOps chaos engineering and incident management tools (PagerDuty Opsgenie).