drjobs SRE Engineer English

SRE Engineer

صاحب العمل نشط

هذا المنشور غير متاح الآن! ربما يكون قد تم شغل الوظيفة.
drjobs

حالة تأهب وظيفة

سيتم تحديثك بأحدث تنبيهات الوظائف عبر البريد الإلكتروني
Valid email field required
أرسل الوظائف
drjobs
أرسل لي وظائف مشابهة
drjobs

حالة تأهب وظيفة

سيتم تحديثك بأحدث تنبيهات الوظائف عبر البريد الإلكتروني

Valid email field required
أرسل الوظائف
الراتب الشهري drjobs

لم يكشف

drjobs

لم يتم الكشف عن الراتب

الوصف الوظيفي

Role : SRE Engineer

Loc: Austin , TX

Type :Fulltime

Who are we looking for?

Application SRE with overall experience of 8+ years of experience in development and supporting Complex and critical large scale distributed systems and extensive hands-on experience in handling production failures & driving root cause analysis and remediation.

Primary Responsibilities:

Effectively handle the Production outages & Performance Issues with quality analysis quick resolutions

Manage incidents and effectively communicate with users, application owners and senior stakeholders across all areas.

Work with development teams to improve applications' operational features for faster MTTD and MTTR and auto recovery

Identify and/or analyze patterns of incidents/problem, conduct flawless post-mortems, develop permanent remediation plans, implement automation to prevent future incidents from re-occurring again

Identify s / processes that can be automated and then work with Engineering team in automating them

Build and improve run books for generalists to minimize operational errors and gain fungibility/efficiency

Build E2E Monitoring (Hardware, Availability, Logging, distributed tracing, Business Transaction) of the system as well as End User Experience Monitoring using APM Tools like Splunk, Appdynamics, 1000Eyes etc. as a developer/configurator for performance diagnostics, monitoring, ing & Dashboarding.

Strong understanding of deployment methodologies and hands on experience in production deployments.

Develop Self-healing solutions for the repeated infrastructure and service failures.

Minimize manual involvement by driving solutions, automation and implementing continuous improvements that creates an operating environment, including development & configuration for dynamic monitoring, ing & recovery

Technical Skills:

Min 8+ years IT experience which includes atleast 3 years of web application production support

Comfortable with large scale production systems and technologies, for example load balancing, monitoring, distributed systems, microservices, and configuration management.

Should have solid hands-on experience in troubleshooting and fixing application failures, application Performance degradation, Code issues, cloud platform issues, Batch Failures, Mongo DB failures, Network failures.

ITIL working knowledge: Event, Incident, Release, Problem and Knowledge Management.

Experience with instrumentation, monitoring, ing, and responding - relative to performance and availability of application, using tools such as AppDynamics, Splunk, 1000Eyes,ITRS etc.

Experience in Administration of Linux Servers, Networking and Load Balancing.

Clear understanding of one or more Cloud systems (PCF, GCP, AWS, Azure Cloud or others)

Hands-on experience in performing Production deployments using tools like cf-cli, bamboo

Hands-on experience in CICD implementation

Qualification:

Education qualification: B.Tech, BE, BCA, MCA, M. Tech or equivalent technical degree from a reputed college.

نوع التوظيف

دوام كامل

نبذة عن الشركة

100 موظف
الإبلاغ عن هذه الوظيفة
إخلاء المسؤولية: د.جوب هو مجرد منصة تربط بين الباحثين عن عمل وأصحاب العمل. ننصح المتقدمين بإجراء بحث مستقل خاص بهم في أوراق اعتماد صاحب العمل المحتمل. نحن نحرص على ألا يتم طلب أي مدفوعات مالية من قبل عملائنا، وبالتالي فإننا ننصح بعدم مشاركة أي معلومات شخصية أو متعلقة بالحسابات المصرفية مع أي طرف ثالث. إذا كنت تشك في وقوع أي احتيال أو سوء تصرف، فيرجى التواصل معنا من خلال تعبئة النموذج الموجود على الصفحة اتصل بنا