MySQL NDB Cluster 7.6 Release Notes
MySQL NDB Cluster 7.6.13 is a new release of NDB 7.6, based on
MySQL Server 5.7 and including features in version 7.6 of the
NDB
storage engine, as well as fixing
recently discovered bugs in previous NDB Cluster releases.
Obtaining NDB Cluster 7.6. NDB Cluster 7.6 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.
For an overview of changes made in NDB Cluster 7.6, see What is New in NDB Cluster 7.6.
This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 5.7 through MySQL 5.7.29 (see Changes in MySQL 5.7.29 (2020-01-13, General Availability)) .
Important Change: It is now possible to divide a backup into slices and to restore these in parallel using two new options implemented for the ndb_restore utility, making it possible to employ multiple instances of ndb_restore to restore subsets of roughly the same size of the backup in parallel, which should help to reduce the length of time required to restore an NDB Cluster from backup.
The --num-slices
options
determines the number of slices into which the backup should be
divided; --slice-id
provides
the ID of the slice (0 to 1 less than the number of slices) to
be restored by ndb_restore.
Up to 1024 slices are supported.
For more information, see the descriptions of the
--num-slices
and
--slice-id
options.
(Bug #30383937, WL #10691)
Incompatible Change:
The minimum value for the
RedoOverCommitCounter
data node configuration parameter has been increased from 0 to
1. The minimum value for the
RedoOverCommitLimit
data
node configuration parameter has also been increased from 0 to
1.
You should check the cluster global configuration file and make any necessary adjustments to values set for these parameters before upgrading. (Bug #29752703)
Microsoft Windows; NDB Disk Data:
On Windows, restarting a data node other than the master when
using Disk Data tables led to a failure in
TSMAN
.
(Bug #97436, Bug #30484272)
NDB Disk Data: Compatibility code for the Version 1 disk format used prior to the introduction of the Version 2 format in NDB 7.6 turned out not to be necessary, and is no longer used.
A faulty ndbrequire()
introduced when
implementing partial local checkpoints assumed that
m_participatingLQH
must be clear when
receiving START_LCP_REQ
, which is not
necessarily true when a failure happens for the master after
sending START_LCP_REQ
and before handling any
START_LCP_CONF
signals.
(Bug #30523457)
A local checkpoint sometimes hung when the master node failed
while sending an LCP_COMPLETE_REP
signal and
it was sent to some nodes, but not all of them.
(Bug #30520818)
Execution of ndb_restore
--rebuild-indexes
together
with the --rewrite-database
and --exclude-missing-tables
options did not create indexes for any tables in the target
database.
(Bug #30411122)
When synchronizing extent pages it was possible for the current
local checkpoint (LCP) to stall indefinitely if a
CONTINUEB
signal for handling the LCP was
still outstanding when receiving the
FSWRITECONF
signal for the last page written
in the extent synchronization page. The LCP could also be
restarted if another page was written from the data pages. It
was also possible that this issue caused
PREP_LCP
pages to be written at times when
they should not have been.
(Bug #30397083)
If a transaction was aborted while getting a page from the disk page buffer and the disk system was overloaded, the transaction hung indefinitely. This could also cause restarts to hang and node failure handling to fail. (Bug #30397083, Bug #30360681)
References: See also: Bug #30152258.
Data node failures with the error Another node failed during system restart... occurred during a partial restart. (Bug #30368622)
If a SYNC_EXTENT_PAGES_REQ
signal was
received by PGMAN
while
dropping a log file group as part of a partial local checkpoint,
and thus dropping the page locked by this block for processing
next, the LCP terminated due to trying to access the page after
it had already been dropped.
(Bug #30305315)
The wrong number of bytes was reported in the cluster log for a completed local checkpoint. (Bug #30274618)
References: See also: Bug #29942998.
The number of data bytes for the summary event written in the cluster log when a backup completed was truncated to 32 bits, so that there was a significant mismatch between the number of log records and the number of data records printed in the log for this event. (Bug #29942998)
Using 2 LDM threads on a 2-node cluster with 10 threads per node could result in a partition imbalance, such that one of the LDM threads on each node was the primary for zero fragments. Trying to restore a multi-threaded backup from this cluster failed because the datafile for one LDM contained only the 12-byte data file header, which ndb_restore was unable to read. The same problem could occur in other cases, such as when taking a backup immediately after adding an empty node online.
It was found that this occurred when
ODirect
was enabled for
an EOF backup data file write whose size was less than 512 bytes
and the backup was in the STOPPING
state.
This normally occurs only for an aborted backup, but could also
happen for a successful backup for which an LDM had no
fragments. We fix the issue by introducing an additional check
to ensure that writes are skipped only if the backup actually
contains an error which should cause it to abort.
(Bug #29892660)
References: See also: Bug #30371389.
In some cases the SignalSender
class, used as
part of the implementation of ndb_mgmd and
ndbinfo
, buffered excessive
numbers of unneeded SUB_GCP_COMPLETE_REP
and
API_REGCONF
signals, leading to unnecessary
consumption of memory.
(Bug #29520353)
References: See also: Bug #20075747, Bug #29474136.
The setting for the
BackupLogBufferSize
configuration parameter was not honored.
(Bug #29415012)
The maximum global checkpoint (GCP) commit lag and GCP save timeout are recalculated whenever a node shuts down, to take into account the change in number of data nodes. This could lead to the unintentional shutdown of a viable node when the threshold decreased below the previous value. (Bug #27664092)
References: See also: Bug #26364729.
A transaction which inserts a child row may run concurrently with a transaction which deletes the parent row for that child. One of the transactions should be aborted in this case, lest an orphaned child row result.
Before committing an insert on a child row, a read of the parent
row is triggered to confirm that the parent exists. Similarly,
before committing a delete on a parent row, a read or scan is
performed to confirm that no child rows exist. When insert and
delete transactions were run concurrently, their prepare and
commit operations could interact in such a way that both
transactions committed. This occurred because the triggered
reads were performed using LM_CommittedRead
locks (see
NdbOperation::LockMode
), which
are not strong enough to prevent such error scenarios.
This problem is fixed by using the stronger
LM_SimpleRead
lock mode for both triggered
reads. The use of LM_SimpleRead
rather than
LM_CommittedRead
locks ensures that at least
one transaction aborts in every possible scenario involving
transactions which concurrently insert into child rows and
delete from parent rows.
(Bug #22180583)
Concurrent SELECT
and
ALTER TABLE
statements on the
same SQL node could sometimes block one another while waiting
for locks to be released.
(Bug #17812505, Bug #30383887)