drjobs Site Reliability Manager English

Site Reliability Manager

صاحب العمل نشط

هذا المنشور غير متاح الآن! ربما يكون قد تم شغل الوظيفة.
drjobs

حالة تأهب وظيفة

سيتم تحديثك بأحدث تنبيهات الوظائف عبر البريد الإلكتروني
Valid email field required
أرسل الوظائف
drjobs
أرسل لي وظائف مشابهة
drjobs

حالة تأهب وظيفة

سيتم تحديثك بأحدث تنبيهات الوظائف عبر البريد الإلكتروني

Valid email field required
أرسل الوظائف
موقع الوظيفة drjobs

Vi - السويد

الراتب الشهري drjobs

لم يكشف

drjobs

لم يتم الكشف عن الراتب

الوصف الوظيفي

Site Reliability Engineer (SRE)

Only GC / USC / GCEAD

Site Reliability Engineer (SRE)

Job Description

Key Responsibilities:

At least 12 years of experience defining and implementing Monitoring solutions alerts Telemetry and instrumentation for onpremises and cloud platforms for large enterprises

Site Reliability Engineer will be playing a key role in building Observability and Resilience capabilities on cloud platform (Azure). Responsibilities of the SRE will be:

Build and configure alerts tracing telemetry and instrumentation required for Infrastructure Monitoring and Application Performance Management.

Role entails implementing dashboards to monitor and share Observability at various levels (engineering teams portfolio senior management).

Support resilience engineering (application and infrastructure resilience) to meet availability requirements.

Work with development engineers cloud engineers product teams and support engineers to gather requirements implement and evolve observability and resilience solutions.

Key Skillsets :

Good knowledge on Observability and Application Performance Monitoring best practices KPIs/metrics on Cloud platforms

Experience in monitoring tools such as Splunk Dyna Trace Prometheus Cloud Watch Azure Monitor New Relic other opensource tools.

Experience building monitoring solutions for variety of workloads such as Micro services (Java / Spring boot desirable) databases Kafka Kubernetes

Experience in resilience engineering and implementing high availability solutions

Experience creating Monitoring dashboards using tools such as Grafana (Preferred) Splunk Kibana Power BI

Ability to work in a fast paced and agile environment

SRE Maturity Level 3 (Expectation)

DevOps Observability

o DORA Metrics are visible

Deployment frequency Mean Time To Restore (MTTR) Cycle time Change failure rate

IaC (Infrastructure as Code)

o Platforms leverage IaC

Test / Release automation

o Unit tests

Test in a vacuum

o Integration tests

o Load test results validated against SLOs

o Test run as part of CI/CD pipeline

o Automated rollback

o Business Continuity Plan for Recovering Service(s)

Capacity planning review

o Show saturation of service as compared to load test and production peak load

Product Management (Security)

o Security scanning

o Documented procedures for Vulnerability Management

o Integrated into CI/CD pipeline (partner with security)

SRE Maturity Level 4 (Advanced)

Modernized application

o Deployment to Kubernetes Azure or SaaS via CI/CD pipeline

Synthetic Monitoring

Canary / Blue Green Deployment

SelfHealing

Auto scaling

Identify KPIs for business performance

Chaos Engineering

Enterprise Process TieIns

Problem management will as part of RCA will review the maturity level of the incident owner

نوع التوظيف

دوام كامل

نبذة عن الشركة

الإبلاغ عن هذه الوظيفة
إخلاء المسؤولية: د.جوب هو مجرد منصة تربط بين الباحثين عن عمل وأصحاب العمل. ننصح المتقدمين بإجراء بحث مستقل خاص بهم في أوراق اعتماد صاحب العمل المحتمل. نحن نحرص على ألا يتم طلب أي مدفوعات مالية من قبل عملائنا، وبالتالي فإننا ننصح بعدم مشاركة أي معلومات شخصية أو متعلقة بالحسابات المصرفية مع أي طرف ثالث. إذا كنت تشك في وقوع أي احتيال أو سوء تصرف، فيرجى التواصل معنا من خلال تعبئة النموذج الموجود على الصفحة اتصل بنا