MySQL 9.3 Reference Manual Including MySQL NDB Cluster 9.3
In the event that the primary Cluster replication process fails, it is possible to switch over to the secondary replication channel. The following procedure describes the steps required to accomplish this.
Obtain the time of the most recent global checkpoint (GCP).
That is, you need to determine the most recent epoch from the
ndb_apply_status
table on the replica
cluster, which can be found using the following query:
mysqlR'
>SELECT @latest:=MAX(epoch)
->FROM mysql.ndb_apply_status;
In a circular replication topology, with a source and a
replica running on each host, when you are using
ndb_log_apply_status=1
, NDB
Cluster epochs are written in the replicas' binary logs.
This means that the ndb_apply_status
table
contains information for the replica on this host as well as
for any other host which acts as a replica of the replication
source server running on this host.
In this case, you need to determine the latest epoch on this
replica to the exclusion of any epochs from any other replicas
in this replica's binary log that were not listed in the
IGNORE_SERVER_IDS
options of the
CHANGE REPLICATION SOURCE TO
statement used to set up this replica. The reason for
excluding such epochs is that rows in the
mysql.ndb_apply_status
table whose server
IDs have a match in the IGNORE_SERVER_IDS
list from the CHANGE REPLICATION SOURCE
TO
statement used to prepare this replicas's
source are also considered to be from local servers, in
addition to those having the replica's own server ID. You
can retrieve this list as
Replicate_Ignore_Server_Ids
from the output
of SHOW REPLICA STATUS
. We
assume that you have obtained this list and are substituting
it for ignore_server_ids
in the
query shown here, which like the previous version of the
query, selects the greatest epoch into a variable named
@latest
:
mysqlR'
>SELECT @latest:=MAX(epoch)
->FROM mysql.ndb_apply_status
->WHERE server_id NOT IN (
ignore_server_ids
);
In some cases, it may be simpler or more efficient (or both)
to use a list of the server IDs to be included and
server_id IN
in the
server_id_list
WHERE
condition of the preceding query.
Using the information obtained from the query shown in Step 1,
obtain the corresponding records from the
ndb_binlog_index
table on the source
cluster.
You can use the following query to obtain the needed records
from the ndb_binlog_index
table on the
source:
mysqlS'
>SELECT
->@file:=SUBSTRING_INDEX(next_file, '/', -1),
->@pos:=next_position
->FROM mysql.ndb_binlog_index
->WHERE epoch = @latest;
These are the records saved on the source since the failure of
the primary replication channel. We have employed a user
variable @latest
here to represent the
value obtained in Step 1. Of course, it is not possible for
one mysqld instance to access user
variables set on another server instance directly. These
values must be “plugged in” to the second query
manually or by an application.
You must ensure that the replica mysqld
is started with
--replica-skip-errors=ddl_exist_errors
before executing START
REPLICA
. Otherwise, replication may stop with
duplicate DDL errors.
Now it is possible to synchronize the secondary channel by running the following query on the secondary replica server:
mysqlR'
>CHANGE REPLICATION SOURCE TO
->SOURCE_LOG_FILE='@file',
->SOURCE_LOG_POS=@pos;
Again we have employed user variables (in this case
@file
and @pos
) to
represent the values obtained in Step 2 and applied in Step 3;
in practice these values must be inserted manually or using an
application that can access both of the servers involved.
@file
is a string value such as
'/var/log/mysql/replication-source-bin.00001'
,
and so must be quoted when used in SQL or application code.
However, the value represented by @pos
must not be quoted. Although MySQL
normally attempts to convert strings to numbers, this case
is an exception.
You can now initiate replication on the secondary channel by issuing the appropriate command on the secondary replica mysqld:
mysqlR'
>START REPLICA;
Once the secondary replication channel is active, you can investigate the failure of the primary and effect repairs. The precise actions required to do this depend upon the reasons for which the primary channel failed.
The secondary replication channel is to be started only if and when the primary replication channel has failed. Running multiple replication channels simultaneously can result in unwanted duplicate records being created on the replicas.
If the failure is limited to a single server, it should in theory
be possible to replicate from S
to
R'
, or from
S'
to R
.