MySQL Shell 8.4
Use this information if you need to repair a cluster in an InnoDB ClusterSet deployment. You can use the information here in any of the following situations:
A cluster in the InnoDB ClusterSet requires maintenance but has no issues with its functioning.
A cluster is functioning acceptably in the InnoDB ClusterSet deployment but has some issues, such as member servers that are offline.
A cluster is not functioning acceptably and needs to be repaired.
A cluster has been marked as invalidated during an emergency failover or controlled switchover procedure.
Section 8.7, “InnoDB ClusterSet Status and Topology” explains how to check
the status of an InnoDB Cluster and of the whole
InnoDB ClusterSet deployment, and the situations in which a
cluster might need repair. You can identify the following
situations from the output of the
command:
clusterSet
.status()
A cluster does not have quorum (that is, not enough members are online to have a majority).
No members of a cluster can be reached.
A cluster's ClusterSet replication channel is stopped.
A cluster's ClusterSet replication channel is configured incorrectly.
A cluster's GTID set is inconsistent with the GTID set on the primary cluster in the InnoDB ClusterSet.
A cluster has been marked as invalidated. If the cluster is still online, the command warns that a split-brain situation might result.
If the cluster is the primary cluster in the InnoDB ClusterSet deployment, before repairing it, you might need to carry out a controlled switchover or an emergency failover to demote it to a replica cluster. After that, you can take the cluster offline if necessary to repair it, and the InnoDB ClusterSet will remain available during that time.
A controlled switchover is suitable if the primary cluster is
functioning acceptably but requires maintenance or has minor
issues. A primary cluster that is functioning acceptably has
the global status OK
when you check it
using the
command. Section 8.8, “InnoDB ClusterSet Controlled Switchover”
explains how to perform this operation.
clusterSet
.status()
An emergency failover is suitable if you cannot contact the primary cluster at all. Section 8.9, “InnoDB ClusterSet Emergency Failover” explains how to perform this operation.
If the primary cluster is not functioning acceptably (with the
global status NOT_OK
) but it can be
contacted, make an attempt to repair any issues using the
information in this section. An emergency failover carries the
risk of losing transactions and creating a split-brain
situation for the InnoDB ClusterSet. If you cannot repair
the primary cluster quickly enough to restore availability,
proceed with an emergency failover and then repair it if
possible.
Follow this procedure to repair an InnoDB Cluster that is part of an InnoDB ClusterSet deployment:
Using MySQL Shell, connect to any member server in the
primary cluster or in one of the replica clusters, using an
InnoDB Cluster administrator account (created with
).
You may also use the InnoDB Cluster server configuration
account, which also has the required permissions. When the
connection is established, get the
cluster
.setupAdminAccount()ClusterSet
object using a
dba.getClusterSet()
or
command. It is important to use an InnoDB Cluster
administrator account or server configuration account so that
the default user account stored in the
cluster
.getClusterSet()ClusterSet
object has the correct
permissions. For example:
mysql-js>\connect admin2@127.0.0.1:4410
Creating a session to 'admin2@127.0.0.1:4410' Please provide the password for 'admin2@127.0.0.1:4410': ******** Save password for 'admin2@127.0.0.1:4410'? [Y]es/[N]o/Ne[v]er (default No): Fetching schema names for autocompletion... Press ^C to stop. Closing old connection... Your MySQL connection id is 42 Server version: 8.0.27-commercial MySQL Enterprise Server - Commercial No default schema selected; type \use <schema> to set one. <ClassicSession:admin2@127.0.0.1:4410> mysql-js>myclusterset = dba.getClusterSet()
<ClusterSet:testclusterset>
Check the status of the whole deployment using AdminAPI's
command in MySQL Shell. Use the clusterSet
.status()extended
option to see exactly where and what the issues are. For
example:
mysql-js> myclusterset.status({extended: 1})
For an explanation of the output, see Section 8.7, “InnoDB ClusterSet Status and Topology”.
Still using an InnoDB Cluster administrator account (created
with
)
or InnoDB Cluster server configuration account, get the
cluster
.setupAdminAccount()Cluster
object using
dba.getCluster()
. You can either connect to
any member server in the cluster you are repairing, or connect
to any member of the InnoDB ClusterSet and use the
name
parameter on
dba.getCluster()
to specify the cluster you
want. For example:
mysql-js> cluster2 = dba.getClusterSet()
<Cluster:clustertwo>
Check the status of the cluster using AdminAPI's
command in MySQL Shell. Use the cluster
.status()extended
option to get the most details about the cluster. For example:
mysql-js> cluster2.status({extended: 2})
For an explanation of the output, see
Checking a cluster's Status with
.
Cluster
.status()
Following an emergency failover, and there is a risk of the transaction sets differing between parts of the ClusterSet, you have to fence the cluster either from write traffic or all traffic. Section 8.10.1, “Fencing Clusters in an InnoDB ClusterSet” explains how, to fence and unfence a cluster, from MySQL Shell 8.0.28.
If the set of transactions (the GTID set) on the cluster is
inconsistent, fix this first. The
command warns you if a replica cluster's GTID set is
inconsistent with the GTID set on the primary cluster in the
InnoDB ClusterSet. A replica cluster in this state has the
global status clusterSet
.status()OK_NOT_CONSISTENT
. You also
need to check the GTID set on a former primary cluster, or a
replica cluster, that has been marked as invalidated during a
controlled switchover or emergency failover procedure. A
cluster with extra transactions compared to the other clusters
in the ClusterSet can continue to function acceptably in the
ClusterSet while it stays active. However, a cluster with
extra transactions cannot rejoin the ClusterSet.
Section 8.10.2, “Inconsistent Transaction Sets (GTID Sets) in InnoDB ClusterSet
Clusters”
explains how to check for and resolve issues with the
transactions on a server.
If there is a technical issue with a member server in the cluster, or with the overall membership of the cluster (such as insufficient fault tolerance or a loss of quorum), you can work with individual member servers or adjust the cluster membership to resolve this. Section 8.10.3, “Repairing Member Servers and Clusters in an InnoDB ClusterSet” explains what operations are available to work with the member servers in a cluster.
If you cannot repair a cluster, you can remove it from the
InnoDB ClusterSet using a
command. For instructions to do this, see
Section 8.10.4, “Removing a Cluster from an InnoDB ClusterSet”. A removed
InnoDB Cluster cannot be added back into an
InnoDB ClusterSet deployment. If you want to use the server
instances in the deployment again, you will need to set up a
new cluster using them.
clusterSet
.removeCluster()
When you have repaired a cluster or carried out the required
maintenance, you can rejoin it to the InnoDB ClusterSet
using a
command. This command validates that the cluster is able to
rejoin, updates and starts the ClusterSet replication channel,
and removes any invalidated status from the cluster. For
instructions to do this, see
Section 8.10.5, “Rejoining a Cluster to an InnoDB ClusterSet”.
clusterSet
.rejoin()