drjobs Site Reliability Engineer with GCP العربية

Site Reliability Engineer with GCP

Employer Active

drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Alpharetta, GA - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Job Description

I have an opportunity for Site Reliability Engineer with GCP Alpharetta GA Onsite Locals and I am looking for a candidate who can join Immediately if you are interested reply to me with your updated resume or if you could refer someone I would really appreciate it.

Position : Site Reliability Engineer With GCP
Location : Alpharetta GA Onsite
Duration : 6 Months
Visa Status : Any No OPT
Need local to GA F2F Interview Required

Job Description: Seeking an experienced Site Reliability Engineer who can operate independently with limited guidance and oversight. This individual will be passionate about enduser experience and will be part of a tightknit distributed engineering team developing and delivering a comprehensive data operations management solution for Client Data Fabric Platform. SRE is a critical role in the entire SDLC from coding scaling and ensuring production stability that includes responding to oncall incidents.

Data Fabric is a GCP cloudnative modern data management platform which allows Client to acquire and curate data provide entity resolution and ingest into a single environment. It is deployed globally in multiple regions highly secured and complies with regional and internal regulatory controls with strict governance and oversight. Business units Data Scientists and many other stakeholders use APIs to consume data managed by the Data Fabric and operate data exchanges to monetize data through B2B and B2C channels.

Data operations management solution consists of:

A web portal UI/UX that provides a single point of access to all data management and data reliability engineering

A suite of backend API services that services the UI and integrates with lowlevel Data Fabric and other thirdparty system APIs

Modern data lakehouse (data lake data warehouse batch and streaming ELT pipelines)

The data operations roadmap envisions a set of rich management capabilities including:

Serves a large community of geographically dispersed data operations stakeholders

Data quality and observability management to detect alert and prevent data anomalies

Troubleshooting triaging and resolving data and data pipeline issues

OLAP batch and streaming big data processing and BI reporting

MLOps

Realtime dashboards alerting and notifications case management user/group management AuthZ and many other foundational capabilities

Tech Stack

Frontend: Angular 17 JavaScript TypeScript HTML SCSS Webpack Module Federation Tailwinds CSS Angular Material Angular Elements

Backend: Java (JDK 17) Spring Framework 6.X.X Spring Boot 3.X.X NestJS 10.X.X REST and GraphQL microservices NodeJS

Tools & Frameworks: Nx build management Monorepo architecture Jenkins CI/CD Fortify Sonar GitHub

Cloud & Data: GCP (GKE Composer Airflow Dataflow Apache Beam BigQuery BigTable Firestore GCS PubSub Vertex AI) Terraform Helm Charts GitOps

Other Technologies: Websockets SSE eventdriven architecture

Environment

Culture: Fastpaced creative resultsoriented

Team Structure: Agile working in 2week sprints using Aha and Jira for project management

Expectations: Selfstarters who can work independently with limited guidance delivering solutions that endusers value and love

General Responsibilities

Contribute to Development Activities: SRE is expected to participate in SDLC activities that include design develop test deploy and operate covering both frontend and backend

CrossFunctional Work: Collaborate with global teams to integrate with existing internal systems and GCP cloud

Issue Resolution: Triage and resolve product or system issues ensuring quality and performance

Documentation: Write technical documentation support guides and run books

Agile Practices: Participate in sprint planning retrospectives and other agile activities

Compliance: Ensure software meets secure development guidelines and engineering standards

SRE Accountability

General: Use coding automation and software engineering principles to ensure scalability performance and reliability efficiently and toilfree

IAC: Build infrastructure as code (IAC) patterns that meet security and engineering standards using one or more technologies (Terraform scripting with cloud CLI and programming with cloud SDK)

CI/CD: Build CI/CD pipelines for build test and deployment of application and cloud architecture patterns using platform (Jenkins) and cloudnative toolchains

Automation: Build automated tooling to deploy service requests to push a change into production. Build runbooks that are comprehensive and detailed to manage detect remediate and restore services

Change Management: Work closely with the dev team to ensure all DevSecOps issues are addressed timely in compliance with Equifax security policies and adherence to Engineering Handbook

Incident management: Solve problems and triage complex distributed architecture service maps. On call for high severity application incidents and improving run books to improve MTTR

RCA and postmortem: Lead root cause analysis and blameless postmortem and own the call to action to remediate recurrences

Customer Focus: Address service disruptions and downtime ensuring endcustomer needs are met and drive processes for a flawless customer experience ensuring

Reliability and Availability: Ensure monitoring of SRE golden signals SLO SLIs and SLAs are honoured within error budgets. Work closely with devs QE POs and other stakeholders providing continuous feedback on uptime scalability and reliability and influence best practices with aim of providing excellent operational experiences

Reliability roadmap: Own the reliability roadmap by taking a holistic view of all data operations management capabilities that includes participating in Production Readiness Review (PRR) and working with stakeholders to ensure DR plans are in place

MustHave Skills

General experience: 57 years of experience in software engineering systems administration database administration and networking. System administration skills including automation and orchestration of Linux/Windows using Terraform Chef Ansible and/or containers (Docker Kubernetes) and shell scripting

CloudNative Application Development: 3 years. Solid experience with developing and supporting cloudnative applications. Experience with cloudbased security: IAM AuthZ

Enduser Application Experience: 3 years experience as a SRE supporting an enduser facing application e.g web/mobile/desktop app that includes UI APIs and backend systems

Development Experience: 2 years of general proficiency with Java or JavaScript/NodeJS

Frontend Experience: Experience with Angular JavaScript TypeScript or modern web application development frameworks

Architecture Knowledge: Understanding of modular systems performance scalability security

Agile Experience: Agile development mindset and experience

ServiceOriented Architecture: Knowledge of RESTful web services JSON AVRO

Application Troubleshooting: De

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.