7.1 Resolve Problems AHF has Detected

  1. Log into the machine where the issue was seen and as the Oracle user run the following commands to obtain the diagnostic collection.
    tfactl diagcollect

    Autonomous Health Framework will prompt you and then guide you through a series of questions and answers so it can collect all the necessary diagnostics.

  2. Transfer the diagnostic zip from the machine where you initiated the collection to a machine with a web browser and unzip it.
  3. Within here you’ll find another zip containing Autonomous Health Framework Insights. Extract that and open the index.html.

    Figure 7-1 AHF Insights


    AHF Insights zip archive

    Figure 7-2 AHF Insights


    AHF Insights index.html file

    Figure 7-3 Detected Problems


    Detected problems

  4. Click Detected Problems.

    The Detected Problems page displays the list of problems detected.

    Figure 7-4 Detected Problems


    Detected problems

    Note:

    Detected Problems section is created only when AHF Insights detects problems.
  5. Click Show to view the details.

    Figure 7-5 Detected Problems Details


    Detected problems details

    The Problem Summary contains:
    • Problem: Describes what happened.
    • Reason: Explains why it happened.
    • Cause: Identifies the root cause.
    • Evidence: Provides proof to support why this is the cause. Evidence sections are expandable to show the underlying data.
    • Resolution Steps: Explains the exact steps to resolve the problem in simple terms.
Problems Causes
  • Node evictions
  • Instance evictions
  • Database slow performance
  • Poor configuration:
    • Jumbo frames
    • UDP buffers
    • IP reassembly buffer
    • HugePages
    • NIC buffer size
    • NIC flow control misconfiguration
    • Insufficient DBWR processes
    • Message buffers in the network interfaces too small
    • DB Writer
    • PGA limit
    • Misconfiguration of RDS/IB network settings
    • Archiver configuration
  • Resource bottlenecks:

    • High CPU Steal
    • NIC unavailable
    • Critical background processes stuck in D state
    • Increasing memory usage of Grid Infrastructure processes
    • Increasing memory usage of database processes
    • Increasing memory usage of non-database processes
    • Increasing memory by new databases
    • DB Recovery Read I/O
    • Latch contention
    • Archiver blocked
    • Insufficient redo log size
  • Resource errors:

    • IP reassembly failures
    • Multipath Disk Failures
    • I/O errors due to insufficient storage space
    • Generic I/O errors