MySQL NDB Cluster 7.5 Release Notes
MySQL NDB ClusterJ: ClusterJ could not be built on Ubuntu 22.10 with GCC 12.2. (Bug #34666985)
In some contexts, a data node process may be sent
SIGCHLD
by other processes. Previously, the
data node process bound a signal handler treating this signal as
an error, which could cause the process to shut down
unexpectedly when run in the foreground in a Kubernetes
environment (and possibly under other conditions as well). This
occurred despite the fact that a data node process never starts
child processes itself, and thus there is no need to take action
in such cases.
To fix this, the handler has been modified to use
SIG_IGN
, which should result in cleanup of
any child processes.
mysqld and ndb_mgmd
processes do not bind any handlers for
SIGCHLD
.
(Bug #34826194)
The running node from a node group scans each fragment
(CopyFrag
) and sends the rows to the starting
peer in order to synchronize it. If a row from the fragment is
locked exclusively by a user transaction, it blocks the scan
from reading the fragment, causing the copyFrag to stall.
If the starting node fails during the
CopyFrag
phase then normal node failure
handling takes place. The cordinator node's transaction
coordinator (TC) performs TC takeover of the user transactions
from the TCs on the failed node. Since the scan that aids
copying the fragment data over to the starting node is
considered internal only, it is not a candidate for takeover,
thus the takeover TC marks the CopyFrag
scan
as closed at the next opportunity, and waits until it is closed.
The current issue arose when the CopyFrag
scan was in the waiting for row lock
state,
and the closing of the marked scan was not performed. This led
to TC takeover stalling while waiting for the close, causing
unfinished node failure handling, and eventually a GCP stall
potentially affecting redo logging, local checkpoints, and NDB
Replication.
We fix this by closing the marked CopyFrag
scan whenever a node failure occurs while the
CopyFrag
is waiting for a row lock.
(Bug #34823988)
References: See also: Bug #35037327.
In certain cases, invalid signal data was not handled correctly. (Bug #34787608)
Following execution of DROP
NODEGROUP
in the management client, attempting to
creating or altering an NDB
table specifying
an explicit number of partitions or using
MAX_ROWS
was rejected with Got
error 771 'Given NODEGROUP doesn't exist in this cluster' from
NDB.
(Bug #34649576)
In a cluster with multiple management nodes, when one management node connected and later disconnected, any remaining management nodes were not aware of this node and were eventually forced to shut down when stopped nodes reconnected; this happened whenever the cluster still had live data nodes.
On investigation it was found that node disconnection handling
was done in the NF_COMPLETEREP
path in
ConfigManager
but the expected
NF_COMPLETEREP
signal never actually arrived.
We solve this by handling disconnecting management nodes when
the NODE_FAILREP
signal arrives, rather than
waiting for NF_COMPLETEREP
.
(Bug #34582919)
When reorganizing a table with
ALTER
TABLE ... REORGANIZE PARTITION
following addition of
new data nodes to the cluster, unique hash indexes were not
redistributed properly.
(Bug #30049013)
During a rolling restart of a cluster with two data nodes, one
of them refused to start, reporting that the redo log fragment
file size did not match the configured one and that an initial
start of the node was required. Fixed by addressing a previously
unhandled error returned by fsync()
, and
retrying the write.
(Bug #28674694)
A data node could hit an overly strict assertion when the thread liveness watchdog triggered while the node was already shutting down. We fix the issue by relaxing this assertion in such cases. (Bug #22159697)
Removed a leak of long message buffer memory that occurred each time an index was scanned for updating index statistics. (Bug #108043, Bug #34568135)
Fixed an uninitialized variable in
Suma.cpp
.
(Bug #106081, Bug #33764143)