AHF Release 24.7

Store Exadata Infrastructure Details for Best Practice Checking

The ahf CLI can now store the details of Exadata Dom0s, storage servers, and switches. These stored details are subsequently used for Best Practice checks.

AHF may not discover all Exadata infrastructure when run on Dom0. As a result, Best Practice checks might miss peer Dom0s, storage servers, and switches.

The ahf CLI now provides the ability to save the details of Exadata Dom0s, storage servers, and switches, using the command:ahf configuration set --type cell --node {nodename} --password. The Best Practice checks will then use this saved configuration for full infrastructure analysis.

AHF discovers peer DomUs from the Oracle Cluster Registry. By merging Oracle Exachk reports from DomU and Dom0, it provides a comprehensive report for the entire Exadata rack.

To merge Oracle Exachk reports, run exachk -merge report_1,report_2.

Manage Cell Configuration

Note:

Configurations set through the AHF CLI are securely stored in the AHF wallet.
  • You can choose to enter different password for each cell node.
  • The password supplied while running the ahf configuration set command will override already stored passwords.
  • To set configuration for a specified cell:
    ahf configuration set --type cell --node <nodename> --password
  • To set configuration for all cells:
    ahf configuration set --type cell  --all-nodes
  • To delete configuration of a specified cell:
    ahf configuration unset --type cell --node <nodename>
  • To delete configuration of all cells:
    ahf configuration unset --type cell --all-nodes
  • To get configuration status of a specified cell:
    ahf configuration get --type cell --node <nodename>
  • To get configuration status of all cells:
    ahf configuration get --type cell --all-nodes
  • To validate configuration of a specified cell:
    ahf configuration check --type cell --node <nodename>

    Use the --to-json flag to retrieve the configuration status in JSON format.

  • To validate configuration of all cells:
    ahf configuration check --type cell --all-nodes

    Use the --to-json flag to retrieve the configuration status in JSON format.

Related Topics

Improved Resource Usage During Compliance Checking

Oracle Orachk/Oracle Exachk now use database connection pooling for compliance checks, leading to optimized resource usage.

By default, Oracle Orachk and Oracle Exachk utilize a dedicated daemon process known as the SQL Agent to maintain DB connection pooling, ensuring efficient and continuous query execution. If Oracle Orachk and Oracle Exachk encounter any issues with the SQL Agent, both will fall back on SQL*Plus, establishing a new DB connection for each query execution.

If you notice any bugs or false positives/negatives in the Oracle Orachk and Oracle Exachk logs or screen output, use the -use_sqlplus option. This option is particularly useful for addressing DB connection issues or errors during the Discovery or Check Execution processes with the SQL Agent, thus preventing service disruptions.
# orachk -use_sqlplus
# exachk -use_sqlplus

For persistent issues, please contact My Oracle Support to report and resolve the erratic behavior.

For more information about compliance checking, see Run Compliance Checks.

Update Java Without Updating AHF

With this enhancement, you can update JRE without updating AHF.

  1. Check the Autonomous Health Framework, Oracle Trace File Analyzer, Oracle Orachk, and Java versions.
    For example:
    # ahfctl version -all
    AHF version: 24.7.0
    TFA version: 24.7.0
    ORACHK  VERSION: 24.7.0_20240714
    JAVA VERSION: 11.0.22
  2. Apply Java update.
    ahfctl applyupdate -updatefile <patch_zip>

    Where updatefile specifies the Java update file generated.

    For example:
    # ahfctl applyupdate -updatefile ahf_36840033_java_JDK11_MAIN_LINUX.X64_240318.11.0.23.B7.zip 
    This is a Java patch. Requires Java version comparison before proceeding.
    Java patch validation passed.
    Stopping TFA before applying JAVA Patch.
    
    Updated file /opt/oracle.ahf/jre
    Java patch applied successfully.
    Starting TFA post JAVA patch completion.
    
    .---------------------------------------------------------------------------------------------------------.
    | Host            | Status of TFA | PID   | Port  | Version    | Build ID              | Inventory Status |
    +-----------------+---------------+-------+-------+------------+-----------------------+------------------+
    | test-node       | RUNNING       | 15719 | 29063 | 24.7.0.0.0 | 240600020240715093016 | COMPLETE         |
    '-----------------+---------------+-------+-------+------------+-----------------------+------------------'
  3. Post update, check the Java version.
    For example:
    $ /opt/oracle.ahf/jre/bin/java --version
    java 11.0.23 2024-04-16 LTS
    Java(TM) SE Runtime Environment 18.9 (build 11.0.23+7-LTS-222)
  4. Post update, check the AHF, TFA, Oracle Orachk, and Java versions.
    For example:
    # ahfctl version -all
    AHF version: 24.7.0
    TFA version: 24.7.0
    ORACHK  VERSION: 24.7.0_20240714
    JAVA VERSION: 11.0.23
  • To rollback to previous Java version, run the ahfctl rollbackupdate -updateid <update_id> command.
    For example:
    $ /opt/oracle.ahf/jre/bin/java --version
    java 11.0.23 2024-04-16 LTS
    # ahfctl rollbackupdate -updateid JDK11_MAIN_LINUX.X64_240318.11.0.23.B7
    Java rollback started.
    Rollback files includes JRE directory. TFA needs to stop first.
    ===== ===== ===== ===== ===== ===== ===== ===== ===== ===== =====
    Stopping TFA before rolling back to original.
    Rolled-back file /opt/oracle.ahf
    Starting TFA post rollback completion.
    Java rollback completed.
    $ /opt/oracle.ahf/jre/bin/java -version
    java version "11.0.21" 2023-10-17 LTS
  • To query when Java was patched, run the ahfctl queryupdate -all command.
    For example:
    # ahfctl queryupdate -all
    Java Update
    Label: JDK11_MAIN_LINUX.X64_240318.11.0.23.B7
    Status: Applied
    Applied on: Wed Jul 24 19:36:24 2024
  • To query when the Java update was rolled back, run the ahfctl queryupdate -all command.
    For example:
    # ahfctl queryupdate -all
    No AHF framework updates applied
    TFA version: 24.7.0
    ORACHK  VERSION: 24.7.0_20240714
    AHF version: 24.7.0
    JAVA VERSION: 11.0.22

Manage HAMI Trace Files with tfactl managelogs

A new command option, -hami, has been added to the tfactl managelogs command to manage HAMI trace files.

Oracle HAMI: Oracle High Availability Metadata Infrastructure service providing distributed services required by DCS including locking and synchronizing configuration details in the cluster.

For more information about managing logs and trace files, see tfactl managelogs.

Improved Platinum Monitoring and Patching

AHF now enables Platinum to query data from dom0, storage servers, and switches.

Platinum provides fault monitoring and patching services for Exadata customers, relying on AHF for Exadata configuration data.

With this enhancement, AHF offers the following capabilities on Exadata dom0, storage servers, and switches:
  • Auto-upgrade
  • Automatic best practice checking
  • Automatic diagnostic collections
  • Auto-upload of diagnostic collections to SRs

These features enhance the Platinum fault detection and patching service by utilizing component relationships. When a fault is detected from dom0, it can identify the impacted database nodes. Patch planning for virtualized racks also benefits from understanding these relationships, reducing downtime.

New Problem Summaries

AHF can now detect and provide the resolution for more problems.

Since version 24.4, AHF has had the ability to detect problems and show a summary with the resolution. For more information, see Node Eviction Detection and Resolution. The Problem Summary page is available under the Detected Problems panel in Insights.

The Problem Summary contains:
  • Problem: Describe what happened.
  • Reason: Explain why it happened.
  • Cause: Identify the root cause.
  • Evidence: Provide proof to support why this is the cause.
  • Resolution Steps: Detail the exact steps to resolve the problem in simple terms.
This release includes the ability to detect the following new problem causes:
  • Node Eviction due to:
    • Archiver blocked due to insufficient space in the recovery area.
    • I/O error due to insufficient space in ASM diskgroup.
    • Private network performance degradation due to misconfigured MTU size.
  • ASM Instance Eviction due to:
    • Being stuck waiting for a failed network interface card.

For more information, see Explore Diagnostic Insights.