drjobs Site Reliability Engineer Azure العربية

Site Reliability Engineer Azure

Employer Active

drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Y - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Job Description

The ecommerce Platform Operations team is responsible for the stability reliability release and deployment of our B2B & B2C ecommerce platforms. The teams primary function is to increase the efficiency of the organization through well designed automation and infrastructure. As a Site Reliability engineer you will work closely with various infrastructure & application development teams to increase stability and reliability via the enablement of various Telemetry concepts. You will also be responsible for effective operations of the ecommerce platform via efficient automation & execution of operational processes. If youre someone who doesnt mind participating in oncall support and enjoys troubleshooting production issues and implementing remediation this position is for you!

Responsibilities:
  • Monitoring and maintaining the Development Testing/QA Staging and Production environments
  • Mitigating production performance issues effectively by taking responsibility for seeing those performance issues through resolution with the goal of automating to prevent problem recurrence
  • Configure monitors alerts Service Level Indications using various Telemetry technologies.
  • Create business friendly dashboards to monitor health of various production systems
  • Collaborate with teams within IT to implement cloud and/or hybrid systems that support the business goals
  • Monitor cloudbased systems and components for availability performance reliability security efficiency and ability to meet nonfunctional requirements and service level agreements.
  • Work with Infrastructure as Code pipelines to automate the deployment of Cloud resources
  • Serve as liaison between application and Cloud team to provide guidance to application teams on application container/pod deployments
  • Investigate troubleshoot and resolve any issues that impact the customer
  • Work to improve performance and reliability as the platform scales driving continuous improvement through operational metrics.
  • Scale Cloud operations through best practices as applicable for configuration management resource allocation optimizing performance and capacity compliance with security policies and requirements and ensuring servicelevel agreements are met
  • Work with Azure cloud engineering team to operationalize Clients cloud vision.
  • Azure Platform. Understanding of Microsoft Azure Cloud platform with emphasis on Azure Infrastructure solutions including IaaS and PaaS based environments and Azure based application monitoring and management
  • Technical Dialog. Lead technical sessions making use of whiteboards or other resources to drive solution discussions leveraging published solution architectures for common infrastructure implementations.
  • Enable proactive monitoring & alerting using Splunk log aggregation.
  • Prepare applications to work on Kubernetes Docker and other hosted systems
  • Work on automation using scripting and be able to integrate different tools.
  • Troubleshoot and help resolve telemetry system and software defects. Perform incident/disruption management and conduct rootcause analysis (RCA).
  • Work successfully within an Agile environment partnering with the Scrum Master
  • Document the work done as well as mentor our FTE.

Required skills:

  • Expert level experience with operating ATG Commerce ecommerce platform (OR) building custom Java / Java EE customerfacing solutions on Azure Cloud environment (AKS).
  • 3 Years Azure Experience
  • Hands on experience with containerization Kubernetes and micro services.
  • Experience with Cloud infrastructure and application monitoring following methodologies such as RED or USE.
    • Familiarity with APM monitoring tools such as Splunk APM AppDynamics and/or Azure AppInsights
    • Familiarity with Infrastructure monitoring tools such as Graphana Prometheus Azure monitor Log Analytics (KQL queries)
  • Experience with log collection tools and analysis as well as infrastructure performance and optimization practices
  • Experience with DevOps automation platforms such as Jenkins Artifactory ACR and/or Azure DevOps
  • Experience with CI/CD provisioning and managing Azure Infrastructure
  • Participate in afterhours oncall rotation and afterhours maintenance window activities as needed
  • Experience performing Root Cause Analysis (RCA) for application and infrastructure related issues
  • Solid grasp of various performance monitoring methodologies as well as 2 years of handson experience configuring monitoring tools such as Azure Application Insights New Relic and Splunk is required. Strong experience with other telemetry tools including AppDynamics Extrahop vSphere Solarwinds Orion SAM etc. will be considered.
  • Top candidate will have experience or thorough understanding of incident workflows (preferably using New Relic). Must have experience enriching alerts for faster rootcause detection and incident resolution.
  • Must be experience configuring monitors for business transactions service end points etc. as well as setup health rules for triggering alerts.
  • Detailed knowledge of relational databases Ex: MS SQL MySQL (OR) NoSQL DB like Cosmos DB. Must be able to construct SQL queries and configure them with telemetry.
  • Strong scripting (bash python shell) skills.
  • Selfstarter with the ability to quickly learn new tools and tool features. Must be able to handle multiple tasks and priorities within a fastpaced work environment
  • Must be highly motivated and dependable with excellent communication skills.
  • Bachelors in Computer Science or other fouryear degree in a relevant field is required
Preferred Skills:
  • Experience using Terraform to perform infrastructure as code
  • Deep working knowledge with Azure networking Application Gateway APIM IAM Policy and network security.
  • Able to deploy and manage Azure storage.
  • Experience with Azure Active Directory management and design experience a plus
  • Production support experience with Ecommerce websites.
  • Experience with tracking measuring and reporting KPIs like MTBI MTRS MTTD etc.

Employment Type

Full Time

Company Industry

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.