Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via email
Manage production operating system resource availability
Perform production operating system IPL / reboot
Monitor production operating system and devices
Perform root cause analysis resolution and / or escalation for production systems (this does not include business applications)
Execute recovery procedures for production operating systems and devices
Execute and reply to console commands
Ensure procedures are implemented and followed
Take the appropriate predefined recovery actions for the various operational events
Restart failing components after an outage
Record and route problems to appropriate support groups
Provide operational status as required
Execute recovery procedures for production operating systems and devices
Perform automated startup and shutdown of the production operating system
Execute production subsystem (e.g. IMS CICS DB2 IDMS) started tasks restarts
Monitor subsystems (e.g. IMS CICS DB2 IDMS)
Managing (i.e. owning) the incident through service restoration
Validating severity classification of the problem
Determining the scope of the problem
Assessing whether Problem Solver has determined what the problem is and whether a recovery plan has been mapped out
Assembling a SWAT team of technical support people (other levels of support across platforms as required) if the Problem Solver is unable to determine what the problem is
Facilitating the SWAT/Service Recovery Team meeting
Escalating as required
Driving problem determination activities
Driving restoration plans
Ensuring the notification of the Location Crisis Manager for Data Center Crisis for exceptional outages (every single customer outage)
Ensuring that Service Management (account team) has been contacted to confirm that the service has been restored to the customers satisfaction (or problems reported by a customer)
Facilitate and/or make service restoration decisions/recommendations (engage the Account Team as required).
Ensuring that the progression of the problem restoration and all relevant times are documented
Contributing to the outage review or RCA process as required
Ensuring that internal notification and escalation activities are executed.
1. Perform system IPLs
2. Reply to WTORs
3. Perform HMC functions
4. Take action on system alerts/messages
5. Perform checklists
6. Monitor SLA regions
7. Startup/Shutdown of online regions
8. Perform SAD
9. Understands D/R concepts and has the ability to execute them
10. Follow up any high severity issues until it gets resolved
11. Familiar with SDSF ISPF JCL TSO VTAM CICS DB2 $AVRS SAR SYSVIEW
12. Ability to manage unscheduled outage
Remote Work :
No
Full Time