MySQL 9.3 Reference Manual Including MySQL NDB Cluster 9.3

7.5.6.3 Group Replication Resource Manager Component

The Group Replication Resource Manager component monitors secondary server lag time and memory usage, and can expel servers which lag excessively or use too many resources from the group. Allowable lag time and resource usage are configurable for both applier channels and recovery channels, as explained in this section. This component is available as part of MySQL Enterprise Edition, beginning with MySQL 9.2.0.

Prior to installing the Group Replication Resource Manager component, the Group Replication plugin must be installed using INSTALL PLUGIN or --plugin-load-add (see Section 20.2.1.2, “Configuring an Instance for Group Replication”); otherwise, the INSTALL COMPONENT statement is rejected with an error. If you attempt to uninstall the Group Replication plugin when the Group Replication Resource Manager component is installed, UNINSTALL PLUGIN fails with the error Plugin 'group_replication' cannot be uninstalled now. Please uninstall the component 'component_group_replication_resource_manager' and then UNINSTALL PLUGIN group_replication.

Once these conditions are met, the Group Replication Resource Manager component can be installed and uninstalled using INSTALL COMPONENT and UNINSTALL COMPONENT, respectively. See the descriptions of these statements, as well as Section 7.5.1, “Installing and Uninstalling Components”, for more information.

The Group Replication Resource Manager component provides a configurable automatic expulsion mechanism which detects when the applier or recovery channel on a group replication secondary is lagging, or when the secondary is swapping excessively, and expels the problematic server from the group, thus helping to maintain high availability. Due to the high availability requirement, in order to use the auto-expulsion functionality with an active replication group, the group must initially consist of no fewer than three members, including the group replication primary; this guarantees that there are at least two members (one primary and one secondary) in the event that one member has been expelled.

Note

The Group Replication Resource Manager component does not monitor the group replication primary, and is not intended to expel the primary, but it is possible for the decision to expel a secondary to be made just before the same secondary is promoted to primary (due to a concurrent primary failure), in which case the just-elected primary may be evicted.

Using the system and status variables provided by this component, the operator can separately monitor each of the three areas of concern—applier lag, recovery lag, and system resource exhaustion—and separate thresholds for expulsion set for each of them, as listed here:

The Resource Manager component checks lag and usage on group replication secondaries every 5 seconds. This period is not configurable by the operator.

A server which has been expelled from the group may subsequently try to rejoin it without manual intervention, provided that group_replication_autorejoin_tries is enabled (otherwise the server proceeds as specified by group_replication_exit_state_action). The auto-rejoin mechanism and behavior are the same as those described in Section 20.7.7.3, “Auto-Rejoin”.

For a replication group member attempting to join or rejoin a group after encountering issues and being expelled, a quarantine period prevents immediate re-expulsion. This period is tracked individually for each member, so that, during the quarantine period started after group member A has been expelled and subsequently allowed to re-join the group, member B can be expelled safely if the need arises. The duration of the quarantine period determined by the value of the group_replication_resource_manager.quarantine_time server system variable. The default length of the quarantine period is 3600 seconds (1 hour).

The Resource Management component provides a number of server status variables which can be used for monitoring the status of Group Replication and the Resource Manager component. In addition to the three such variables discussed previously, these include the following:

In addition, it is possible to determine if and when errors have occurred when attempting to get lag or memory usage information by checking the status variables listed here:

For general information about MySQL Group Replication, see Chapter 20, Group Replication.