Understanding the Self-Health Metrics
Oracle Communications Unified Assurance pollers, collectors, and threshold engines save self-health metrics on the performance of the individual application. Administrators can use the collected self-health metrics to monitor the overall performance and as an advanced indicator of potential issues, such as overloaded poll cycle, polling cluster balance, database contention and latency. These metrics allows administrators to make informed decisions about the overall Unified Assurance installation and to maintain application efficiency and performance, such as adding additional pollers to the cluster to spread the poller load.
Some of the self-health metrics include:
-
Average Db Time - The average time taken to insert data into the database per poll cycle.
-
Average Poll Time - The average amount of time taken to poll a single device for data per poll cycle.
-
Database Queue Length - The number of metrics in the DB queue after the poll cycle ends.
-
Poll Duration - The time the poll cycle took to complete per cycle.
-
Poll Queue Length - The number of devices in the queue before the next poll period starts.
-
Polled Devices - The number of devices polled by the application. For threshold engines, this number will be the number of metrics processed.
Viewing Self-Health Metrics
The self-health metrics are viewable from the metric tab in the collector's Device Overview dashboard. The Unified Assurance tab displays the collected list of performance metrics per Unified Assurance application running on the server.
The tab works like the other metric tabs allowing drill-downs to individual metric graphs when a row is clicked.
Troubleshooting
-
Average Db Time - If this value is extremely high, there may be high network latency, other networking issues, or high database contention for inserts. If happens consistently.
-
Average Poll Time - If this value is extremely high, you may have latency or network issues when communicating with polled devices.
-
Database Queue Length - This number may vary. The value should stay relatively steady, or increase slightly over time with additional devices / metrics being polled. If there are any large spikes or a constant increase, this may point towards a DB connectivity issue or too few database threads to handle the amount of metrics being inserted.
-
Poll Duration - Generally, a value of greater than half of the configured poll time indicates an overloaded poller. Adjust the poller to give it more threads, split it into a cluster, or add additional pollers to the cluster.
-
Poll Queue Length - This value should be zero or stay consistent. If the value is increasing over time, the application may be getting behind (starting a second poll before the first finished), check the Poll Duration metric listed. If pollers get behind and are left unchecked, it can cause issues with collection and delay metric insertion.