3.2.3.1 Detected Problems

The Detected Problems dashboard provides a centralized, real-time view of critical issues identified across database clusters. It helps administrators and engineers quickly assess the nature and impact of system anomalies, facilitating faster root cause analysis and resolution.

Figure 3-12 Detected Problems


This image illustrates detected problems.

This dashboard is a key tool for maintaining fleet health, reliability, and performance.

The Detected Problems page offers:
  • A summary of active and historical problems detected across clusters and nodes
  • Insights into the root cause, impact level, and problem type
  • Drilldowns and filters to accelerate issue investigation and remediation

Centralized Problem Table

A sortable, filterable table that lists:
  • Problem type (e.g., performance degradation, node eviction)
  • Affected node and cluster
  • Time of occurrence
  • Root cause and resolution status

Filtering options

These filters support targeted analysis during incident response or performance reviews.

  • Select Cluster: View problems for a specific cluster
  • Select Problem Type: Focus on categories like Performance Issue, Node Eviction, High CPU, etc.
  • Clear Button: Quickly reset filters to return to the full problem list

Use cases

  • Root cause analysis

    Drill into specific issues to trace back to their root cause and related system events, accelerating resolution.

  • Operational monitoring

    Track the frequency and patterns of recurring problems to identify systems needing immediate attention or optimization.

  • Capacity and reliability planning
    Surface early warning signs of system stress, such as:
    • High memory/CPU usage
    • Jumbo frame mismatches
    • Recurrent network disconnects

    This allows for preemptive scaling or configuration adjustments.

  • Audit and compliance
    Access historical records of issues, including:
    • Timestamps
    • System impact
    • Resolutions applied

    Supports internal audits, incident postmortems, and SLA tracking.