drjobs SRE Engineer العربية

SRE Engineer

Employer Active

The job posting is outdated and position may be filled
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

others - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Job Description

Role : SRE Engineer

Loc: Austin , TX

Type :Fulltime

Who are we looking for?

Application SRE with overall experience of 8+ years of experience in development and supporting Complex and critical large scale distributed systems and extensive hands-on experience in handling production failures & driving root cause analysis and remediation.

Primary Responsibilities:

Effectively handle the Production outages & Performance Issues with quality analysis quick resolutions

Manage incidents and effectively communicate with users, application owners and senior stakeholders across all areas.

Work with development teams to improve applications' operational features for faster MTTD and MTTR and auto recovery

Identify and/or analyze patterns of incidents/problem, conduct flawless post-mortems, develop permanent remediation plans, implement automation to prevent future incidents from re-occurring again

Identify s / processes that can be automated and then work with Engineering team in automating them

Build and improve run books for generalists to minimize operational errors and gain fungibility/efficiency

Build E2E Monitoring (Hardware, Availability, Logging, distributed tracing, Business Transaction) of the system as well as End User Experience Monitoring using APM Tools like Splunk, Appdynamics, 1000Eyes etc. as a developer/configurator for performance diagnostics, monitoring, ing & Dashboarding.

Strong understanding of deployment methodologies and hands on experience in production deployments.

Develop Self-healing solutions for the repeated infrastructure and service failures.

Minimize manual involvement by driving solutions, automation and implementing continuous improvements that creates an operating environment, including development & configuration for dynamic monitoring, ing & recovery

Technical Skills:

Min 8+ years IT experience which includes atleast 3 years of web application production support

Comfortable with large scale production systems and technologies, for example load balancing, monitoring, distributed systems, microservices, and configuration management.

Should have solid hands-on experience in troubleshooting and fixing application failures, application Performance degradation, Code issues, cloud platform issues, Batch Failures, Mongo DB failures, Network failures.

ITIL working knowledge: Event, Incident, Release, Problem and Knowledge Management.

Experience with instrumentation, monitoring, ing, and responding - relative to performance and availability of application, using tools such as AppDynamics, Splunk, 1000Eyes,ITRS etc.

Experience in Administration of Linux Servers, Networking and Load Balancing.

Clear understanding of one or more Cloud systems (PCF, GCP, AWS, Azure Cloud or others)

Hands-on experience in performing Production deployments using tools like cf-cli, bamboo

Hands-on experience in CICD implementation

Qualification:

Education qualification: B.Tech, BE, BCA, MCA, M. Tech or equivalent technical degree from a reputed college.

Employment Type

Full Time

Company Industry

About Company

100 employees
Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.