Automatic Recovery and Preventive Operations

As a managed service, HeatWave perform various automatic recovery and preventive operations when required.

Some of these automatic operations are:
  • Failover of a high availability DB system
  • Recovery of a defective read replica
  • Recovery of a defective secondary instance of a high availability DB system
  • Expansion of the DB system storage in automatic storage expansion

The failover of a high availability DB system happens when existing primary instance has stopped functioning. This causes all connections of the DB system to be broken and affects the availability of the DB system. The recovery of a read replica affects only the availability of the read replica being recovered, the DB system and other read replicas remain online and available. The recovery of a secondary instance and the expansion of storage are online operations that do not affect the availability of the DB system.

During these operations, the lifecycle state of the DB system is set to Updating to prevent customer actions from affecting the automatic operations. Actions such as start, stop, restart, or update are not allowed and returns an error message like DbSystem '<OCID>' can not be used in '<OPERATION>' operation, because it is currently in state <LIFECYCLE_STATE>.

As some of these operations do not generate any customer exposed work request, one way to verify the occurrence of these operations is through the events captured in the audit service. The failover of a high availability DB system and the recovery of a read replica or secondary instance generate the MySQL - Automatic Recovery event. The expansion of DB system storage generates the MySQL - Update DB System event.