Sr System Reliability Engineer Application Support العربية

Sr System Reliability Engineer Application Support

Fulcrum Digital

Posted on : 10-04-2024

Employer Active

1 Vacancy

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Send me jobs like this

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Jobs by Experience

6years

Job Location

Missouri - USA

Monthly Salary

Not Disclosed

Salary Not Disclosed

Vacancy

1 Vacancy

Posted on : 10-04-2024

Job Description

Who are we

Fulcrum Digital is an agile and nextgeneration digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries including banking & financial services insurance retail higher education food health care and manufacturing.

The Role

Provide L2 support to production system like application database middleware components infrastructure and network components
Manage productions incidents endtoend within defined SLAs with focus on resolution rather than who caused it.
Interact with various stake holders such as Release managers program leads service managers development and test leads
Review operational readiness requirements such as monitoring and alerting log rotation and resilience of the components and report the gaps
Provide preimplementation support with activities such as release notes review and implementation dry runs.
Protect production components by running health checks monitoring latency and memory utilization.
Automate dayto day activities and propose changes that improve reliability
Participate in CAB and provide feedback on change requests
Support the DevOps team in testing the promote pipelines and suggest automation of configuration items.
Practice incident management best practices and perform RCA.
Participate in disaster recovery tests and operational acceptance tests
Analyze the technology stack that makes up the product and optimize recovery time objective.
Work with team members spread across and time zones
Share knowledge document improvements and mentor junior resources

Requirements

Responsibility Matrix

Deployments MTF/Prod
Maintenance items (including stop/start Disaster Recoveryrelated activities etc.)
Monitoring
Support TRTs
Incident creation
CR for changes in MTF/Prod

Tools

Log Monitoring Tool Splunk
Application Monitoring tool Dynatrace
Ticketing incident/problem management tool Remedy
Linux
SQL
Devops Basics CICD Basics Overview of git Bit bucket SonarQube Fortify CI(Jenkins) ARA Saltstack Chef Artifactory MC DevOps Tool chain

Skills

Linux & Shell Scripting
ITIL / ITSM
PL/SQL
Application Troubleshooting
Monitoring Tool Splunk (preferred) Dynatrace (preferred) or any other monitoring tool
Jenkins CI/CD
Groovy
Any Cloud AWS / Azure / PCF
Git basic/bit bucket
Even Framework architecture good to have
Ansible/Chef

Understanding of event-driven architectures Distributed systems - How clusters are formed, Quorum management, Failure handling. 3 to 5 years of hands-on Experience in MQ or NATS broker or similar messaging solutions. Understanding of Kafka clustering would be good to have. Knows Client-Server communication aspects - sockets, TLS protocol etc Understands the concept of region and AZs. Provide L2 support production systems like application, database, middleware components, infrastructure and network components. Manage production incidents end-to-end within defined SLAs with focus on resolution rather than who caused it. Interact with various stakeholders such as Release managers, program leads, service managers, development and test leads Review operational readiness requirements such as monitoring and alerting, log rotation and resilience of the components and report the gaps Provide pre-implementation support with activities such as release notes review and implementation dry runs. Protect production components by running health checks monitoring latency and memory utilization. Automate day-to-day activities and propose changes that improve reliability Participate in CAB and provide feedback on change requests Support the DevOps team in testing the promoted pipelines and suggest automation of configuration items. Practice incident management best practices and perform RCA. Participate in disaster recovery tests and operational acceptance tests Analyze the technology stack that makes up the product and optimize recovery time objective. Work with team members spread across and time zones Share knowledge, document improvements and mentor junior resources It is good to have skills using Jenkins to orchestrate builds and link to Sonar, Maven, etc. to build out the CI/CD pipeline. Support deployments of code into multiple lower environments. Supporting current processes needed with an emphasis on automating everything as soon as possible. It is good to have skill to design, Implement, and enhance our deployment automation based on Chef. We need proven experience designing and implementing an overall release and deployment process. It is good to have skill to design and implement a Git based code management strategy that will support multiple environment deployments in parallel. Experience with automation for Branch management, code promotions, and version management. Engage in and improve the whole lifecycle of services from inception and design through deployment, operation, and refinement. Requirements MQ/EB Understanding of event-driven architectures Distributed systems - How clusters are formed, Quorum management, Failure handling. 3 to 5 years of hands-on Experience in MQ or NATS broker or similar messaging solutions. An understanding of Kafka clustering would be good to have. Knows Client-Server communication aspects - sockets, TLS protocol etc Understand the concept of region and AZs. Deployments MTF/Prod, Maintenance items (including stop/start, Disaster Recovery-related activities, etc.), CR for changes in MTF/Prod Good knowledge on Nginx Tools - Log Monitoring Tool - Splunk Application Monitoring tool - Dynatrace Ticketing incident/problem management tool - Remedy Dev-ops Basics - CI-CD Basics, Overview of Git, Bit-bucket, SonarQube, Ansible/Chef Skills - Linux & Shell Scripting ITIL / ITSM PL/SQL Troubleshooting Jenkins - CI/CD Groovy Scripting/Yaml Ansible/Chef Nginx Java / JEE Event-Driven Architectures MQ or NATS broker or similar messaging solutions. Kafka Client-server communication aspects - sockets, TLS protocol Understand the concept of region and AZs.

Employment Type

Full Time

Company Industry

Key Skills

Apply Now

About Company

Fulcrum Digital

Report This Job

Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.

Free AI Resume Review

Get Hired 3x Faster with free, confidential review from Ai resume review service.

Order Now

Resume, LinkedIn, Cover Letter

Elevate your professional profile with expertly crafted documents including your resume, LinkedIn profile, cover letter.

Start Now

Dr.Job AutoApply

3X your job search with AutoApply's AI for faster dream job results.

Learn More

Reverse Recruiting

Never apply for a job again. We apply and track jobs for you to find your perfect match.

Sr System Reliability Engineer Application Support

Fulcrum Digital

Job Description

Requirements

Employment Type

Company Industry

Key Skills

About Company

Similar Jobs

DevOps engineer

IDQ Engineer

Frontend Engineer

Senior Security Engineer

Remote Electrical Engineer

Material Handler W CDL

Event Coordinator - Remote

Office Travel Assistant Remote