Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailNot Disclosed
Salary Not Disclosed
1 Vacancy
Job Title: Open Telemetry (SME) Consultant
Location: Remote
Duration: Long Term Contract
Job Description:
We are seeking an experienced monitoring tools and Open Telemetry Subject Matter Expert (SME) who will be responsible for designing implementing and optimizing monitoring solutions and leveraging Open Telemetry to enhance observability within the Enterprise Command Center (ECC). The SME should collaborate with the Incident Management team to troubleshoot and resolve incidents.
Key Job Functions
Lead the design and implementation of monitoring solutions using industry standard tools such as Splunk and others.
Customize monitoring configurations to align with the organizational requirements.
Implement and integrate Open Telemetry across various applications and services for enhanced observability.
Optimize monitoring solutions for efficiency and accuracy ensuring minimal impact on system performance.
Responsible for designing and implementing application and infrastructure performance monitoring under AWS Cloud environment.
Create monitors and dashboards to monitor applications and infrastructure performance.
Perform deep statistical analysis using performance data to help identify capacity and performance bottlenecks.
Configure alerting mechanisms within monitoring tools to proactively identify and address potential issues.
Develop comprehensive documentation for monitoring tool configurations Open Telemetry implementations and best practices.
Provide training to incident management teams on utilizing monitoring tools and interpreting open telemetry data effectively.
Setup monitoring dashboards for incident detection and alerting.
Perform endtoend analysis of transactions under an observability environment.
Troubleshoot incidents and identify root cause quickly using wire data analytics application performance management and event correlation monitoring tools.
Diagnose and resolve incidents by providing factual data from the various monitoring and instrumentation systems.
Job Requirements:
A good understanding of the IT Cloud infrastructure that includes AWS Cloud middleware database storage and/or network infrastructure.
Strong understanding of IT infrastructure networking security concepts and application architecture.
Handson experience with Open Telemetry instrumentation and telemetry data collection.
Proven experience as a Splunk SM with indepth knowledge of Splunk architecture and components.
Excellent troubleshooting and problemsolving skills.
Strong documentation skills and attention to detail.
Proactively monitoring of hardware software and environmental alerts or malfunctions.
Analyze dashboards and monitoring tools to look for trends and patterns in application/infrastructure health and performance.
Monitor applications and infrastructure using tools like Splunk DynaTrace Catchpoint MoogSoft xMatters SignalFx Catchpoint MoogSoft xMatters SolarWinds Extrahop etc.
Expert understanding of micro servicebased applications deployed in Cloud using Lambdas ECS Fargate etc.
Proficiency in AWS services like IAM Roles Security groups EC2 S3 Lambda ALB ECS etc.
Experience working with AWS tools like ELB RDS Redshift DynamoDB Aurora Route53 Lambda S3 Batch CloudWatch CloudTrail WAF etc.
Hands on experience with transaction level monitoring using Dynatrace and Splunk.
Create Splunk search queries and dashboards.
Be the SME in helping recognize and onboard new data sources into Splunk and other tools analyze the data for anomalies and trends and building dashboards highlighting the key trends of the data.
Implement best in class engineering strategies to support a distributed clustered Splunk environment consisting of Search Heads Indexers Forwarders Splunk Enterprise Security (ES) app spanning security performance engineering and operational roles.
Use opensource Observability framework Open Telemetry for instrumenting generating collecting and exporting telemetry data such as traces metrics logs to help analyze application performance and behavior.
Use distributed tracing in an endtoend visibility environment that consists of microservices Containers Serverless and Lambda.
Work closely with application teams and business stakeholders to perform troubleshooting and aid in incident triage.
Influence other technical teams on incident calls and articulate troubleshooting steps effectively.
Follow up on items that could negatively impact production operations assist with postmortem related activities and support various efforts related to operational improvements.
Strong relationship management skills and aptitude to multitask and work well in a high stress environment both within teams and independently.
Preferred Qualifications
Familiarity with distributed tracing and logging solutions.
Knowledge of Cloud Platforms (AWS Azure) and their integration with monitoring tools.
AWS Solution Architect Associate or higher certification.
Exposure working under a incident management environment.
Triage incidents to resolution in a 24/7/365 environment effectively guide incident triage calls from a technical perspective share technical details obtained from monitoring tools and dashboards to aid troubleshooting outline details of resolution activities provide timely status updates to stakeholders assist with postmortem related activities and support various efforts related to operational improvements.
Ability to report incident details and metrics to senior leadership.
Perform analysis of data evaluating multiple application protocols including web database storage and supporting infrastructure such as UNIX DNS LDAP SSL SMTP and FTP.
Proficient in Scripting UNIX/LINUX Shell Scripting & Python. Working knowledge of JavaScript / Perl etc. for customizing monitoring configurations
Certification in relevant monitoring tools or Open Telemetry is a plus.
AWS,SHELL SCRIPTING,PERL,MIDDLEWARE,IT INFRASTRUCTURE,AZURE,JAVASCRIPT,SPLUNK,PYTHON,UNIX,SMTP
Full Time