Typical Job System Issues
Below are some of the top issues that can affect Job System performance:
- Agent is Down, Unknown, or Suspended in Blackout
- Agent is overloaded resulting in excessive job retries (Metric Extensions can often cause this)
- Priority jobs are getting starved due to failing System Retry Jobs
- DB session hang due to repository background process deadlocks
- OMS UI console to PBS communication failure
- Corrective Actions trigger too frequently due to incorrect metric threshold settings
- User-suspended jobs are locking resources
- Long running jobs are blocking common Job System resources, thus preventing new jobs from running
- Jobs backlog due to stuck head of the queue
The job diagnostics dashboard allows administrators to easily identify the above issues, diagnose the root cause and take appropriate action.