Resolving Database and Database Instance Delays

7.3 Resolving Database and Database Instance Delays

Blocker Resolver preserves the database performance by resolving delays and keeping the resources available.

Blocker Resolver Architecture
Blocker Resolver autonomously runs as a DIA0 task within the database.
Optional Configuration for Blocker Resolver
You can adjust the sensitivity, and control the size and number of the log files used by Blocker Resolver.
Blocker Resolver Diagnostics and Logging
Blocker Resolver autonomously resolves delays and continuously logs the resolutions in the database alert logs and the diagnostics in the trace files.
Using the Cluster Resource Activity Log to Monitor Cluster Resource Failures
The cluster resource activity log provides precise and specific information about a resource failure, separate from diagnostic logs.

Parent topic: Resolve Database Issues

7.3.1 Blocker Resolver Architecture

Blocker Resolver autonomously runs as a DIA0 task within the database.

Blocker Resolver works in the following three phases:

Detect: In this phase, Blocker Resolver collects the data on all the nodes and detects the sessions that are waiting for the resources held by another session.
Analyze: In this phase, Blocker Resolver analyzes the sessions detected in the Detect phase to determine if the sessions are part of a potential delay. If the sessions are suspected as delayed, Blocker Resolver then waits for a certain threshold time period to ensure that the sessions are delayed.
Verify: In this phase, after the threshold time period is up, Blocker Resolver verifies that the sessions are delayed and selects a session that's causing the delay.

After selecting the session that's causing the delay, Blocker Resolver applies resolution methods on that session. If the chain of sessions or the delay resolves automatically, then Blocker Resolver does not apply delay resolution methods. However, if the delay does not resolve by itself, then Blocker Resolver resolves the delay by terminating the session that's causing the delay. If terminating the session fails, then Blocker Resolver terminates the process of the session. This entire process is autonomous and does not block resources for a long period and does not affect the performance.

For example, if a high rank session is included in the chain of delayed sessions, then Blocker Resolver expedites the termination of the session that's causing the delay. Termination of the session that's causing the delay prevents the high rank session from waiting too long and helps to maintain performance objective of the high rank session.

Parent topic: Resolving Database and Database Instance Delays

7.3.2 Optional Configuration for Blocker Resolver

You can adjust the sensitivity, and control the size and number of the log files used by Blocker Resolver.

Note:

The DBMS_HANG_MANAGER package is deprecated in Oracle Database 23ai. Use DBMS_BLOCKER_RESOLVER instead. The DBMS_HANG_MANAGER package provides a method of changing some configuration parameters and constraints to address session issues. This package is being replaced with DBMS_BLOCKER_RESOLVER. DBMS_HANG_MANAGER can be removed in a future release.

Sensitivity

If Blocker Resolver detects a delay, then Blocker Resolver waits for a certain threshold time period to ensure that the sessions are delayed. Change threshold time period by using DBMS_BLOCKER_RESOLVER to set the sensitivity parameter to either Normal or High. If the sensitivity parameter is set to Normal, then Blocker Resolver waits for the default time period. However, if the sensitivity is set to High, then the time period is reduced by 50%.

By default, the sensitivity parameter is set to Normal. To set Blocker Resolver sensitivity, run the following commands in SQL*Plus as SYS user:

To set the sensitivity parameter to Normal:

exec dbms_blocker_resolver.set(dbms_blocker_resolver.sensitivity, dbms_blocker_resolver.sensitivity_normal);

To set the sensitivity parameter to High:

exec dbms_blocker_resolver.set(dbms_blocker_resolver.sensitivity, dbms_blocker_resolver.sensitivity_high);

Size of the Trace Log File

The Blocker Resolver logs detailed diagnostics of the delays in the trace files with _base_ in the file name. Change the size of the trace files in bytes with the base_file_size_limit parameter. Run the following command in SQL*Plus, for example, to set the trace file size limit to 100 MB:

exec dbms_blocker_resolver.set(dbms_blocker_resolver.base_file_size_limit, 104857600);

Number of Trace Log Files

The base Blocker Resolver trace files are part of a trace file set. Change the number of trace files in trace file set with the base_file_set_count parameter. Run the following command in SQL*Plus, for example, to set the number of trace files in trace file set to 6:

exec dbms_blocker_resolver.set(dbms_blocker_resolver.base_file_set_count,6);

By default, base_file_set_count parameter is set to 5.

Parent topic: Resolving Database and Database Instance Delays

7.3.3 Blocker Resolver Diagnostics and Logging

Blocker Resolver autonomously resolves delays and continuously logs the resolutions in the database alert logs and the diagnostics in the trace files.

Blocker Resolver logs the resolutions in the database alert logs as Automatic Diagnostic Repository (ADR) incidents with incident code ORA–32701.

You also get detailed diagnostics about the delay detection in the trace files. Trace files and alert logs have file names starting with database instance_dia0_.

The trace files are stored in the $ ADR_BASE/diag/rdbms/database name/database instance/incident/incdir_xxxxxx directory
The alert logs are stored in the $ ADR_BASE/diag/rdbms/database name/database instance/trace directory

Example 7-1 Blocker Resolver Trace File for a Local Instance

This example shows an example of the output you see for Blocker Resolver for the local database instance

Trace Log File .../oracle/log/diag/rdbms/hm1/hm11/incident/incdir_111/hm11_dia0_11111_i111.trc
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
...
*** 2016-07-16T12:39:02.715475-07:00
HM: Hang Statistics - only statistics with non-zero values are listed

            current number of active sessions 3
              current number of hung sessions 1
  instance health (in terms of hung sessions) 66.67%
       number of cluster-wide active sessions 9
         number of cluster-wide hung sessions 5
   cluster health (in terms of hung sessions) 44.45%

*** 2016-07-16T12:39:02.715681-07:00
Resolvable Hangs in the System
                      Root       Chain Total               Hang
   Hang Hang          Inst Root  #hung #hung  Hang   Hang  Resolution
     ID Type Status   Num  Sess   Sess  Sess  Conf   Span  Action
  ----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------
      1 HANG RSLNPEND    3    44     3     5   HIGH GLOBAL Terminate Process
  Hang Resolution Reason: Although hangs of this root type are typically
    self-resolving, the previously ignored hang was automatically resolved.

Example 7-2 Error Message in the Alert Log Indicating a Delayed Session

This example shows an example of a Blocker Resolver alert log on the primary instance

2016-07-16T12:39:02.616573-07:00
Errors in file .../oracle/log/diag/rdbms/hm1/hm1/trace/hm1_dia0_i1111.trc  (incident=1111):
ORA-32701: Possible hangs up to hang ID=1 detected
Incident details in: .../oracle/log/diag/rdbms/hm1/hm1/incident/incdir_1111/hm1_dia0_11111_i1111.trc
2016-07-16T12:39:02.674061-07:00
DIA0 requesting termination of session sid:44 with serial # 23456 (ospid:34569) on instance 3
     due to a GLOBAL, HIGH confidence hang with ID=1.
     Hang Resolution Reason: Although hangs of this root type are typically
    self-resolving, the previously ignored hang was automatically resolved.
DIA0: Examine the alert log on instance 3 for session termination status of hang with ID=1.

Example 7-3 Error Message in the Alert Log Showing a Session Delay Resolved by Blocker Resolver

This example shows an example of a Blocker Resolver alert log on the local instance for resolved delays

2016-07-16T12:39:02.707822-07:00
Errors in file .../oracle/log/diag/rdbms/hm1/hm11/trace/hm11_dia0_11111.trc  (incident=169):
ORA-32701: Possible hangs up to hang ID=1 detected
Incident details in: .../oracle/log/diag/rdbms/hm1/hm11/incident/incdir_169/hm11_dia0_30676_i169.trc
2016-07-16T12:39:05.086593-07:00
DIA0 terminating blocker (ospid: 30872 sid: 44 ser#: 23456) of hang with ID = 1
     requested by master DIA0 process on instance 1
     Hang Resolution Reason: Although hangs of this root type are typically
    self-resolving, the previously ignored hang was automatically resolved.
     by terminating session sid:44 with serial # 23456 (ospid:34569)
...
DIA0 successfully terminated session sid:44 with serial # 23456 (ospid:34569) with status 0.

Parent topic: Resolving Database and Database Instance Delays

7.3.4 Using the Cluster Resource Activity Log to Monitor Cluster Resource Failures

The cluster resource activity log provides precise and specific information about a resource failure, separate from diagnostic logs.

If an Oracle Clusterware-managed resource fails, then Oracle Clusterware logs messages about the failure in the cluster resource activity log. Failures can occur as a result of a problem with a resource, a hosting node, or the network. The cluster resource activity log provides a unified view of the cause of resource failure.

Writes to the cluster resource activity log are tagged with an activity ID and any related data gets the same parent activity ID, and is nested under the parent data. For example, if Oracle Clusterware is running and you run the crsctl stop clusterware -all command, then all activities get activity IDs, and related activities are tagged with the same parent activity ID. On each node, the command creates sub-IDs under the parent IDs, and tags each of the respective activities with their corresponding activity ID. Further, each resource on the individual nodes creates sub-IDs based on the parent ID, creating a hierarchy of activity IDs. The hierarchy of activity IDs enables you to analyze the data to find specific activities.

For example, you may have many resources with complicated dependencies among each other, and with a database service. On Friday, you see that all of the resources are running on one node but when you return on Monday, every resource is on a different node, and you want to know why. Using the crsctl query calog command, you can query the cluster resource activity log for all activities involving those resources and the database service. The output provides a complete flow and you can query each sub-ID within the parent service failover ID, and see, specifically, what happened and why.

You can query any number of fields in the cluster resource activity log using filters. For example, you can query all the activities written by specific operating system users such as root. The output produced by the crsctl query calog command can be displayed in either a tabular format or in XML format.

The cluster resource activity log is an adjunct to current Oracle Clusterware logging and alert log messages.

Note:

Oracle Clusterware does not write messages that contain security-related information, such as log-in credentials, to the cluster activity log.

Parent topic: Resolving Database and Database Instance Delays