MySQL NDB Cluster 7.6 Release Notes
MySQL NDB Cluster 7.6.8 is a new release of NDB 7.6, based on
MySQL Server 5.7 and including features in version 7.6 of the
NDB
storage engine, as well as fixing
recently discovered bugs in previous NDB Cluster releases.
Obtaining NDB Cluster 7.6. NDB Cluster 7.6 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.
For an overview of changes made in NDB Cluster 7.6, see What is New in NDB Cluster 7.6.
This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 5.7 through MySQL 5.7.24 (see Changes in MySQL 5.7.24 (2018-10-22, General Availability)).
Performance: This release introduces a number of significant improvements in the performance of scans; these are listed here:
Row checksums help detect hardware issues, but do so at the
expense of performance. NDB
now
offers the possibility of disabling these by setting the new
ndb_row_checksum
server
system variable to 0; doing this means that row checksums
are not used for new or altered tables. This can have a
significant impact (5 to 10 percent, in some cases) on
performance for all types of queries. This variable is set
to 1 by default, to provide compatibility with the previous
behavior.
A query consisting of a scan can execute for a longer time in the LDM threads when the queue is not busy.
Previously, columns were read before checking a pushed condition; now checking of a pushed condition is done before reading any columns.
Performance of pushed joins should see significant improvement when using range scans as part of join execution.
(WL #11722)
Packaging:
Expected NDB header files were in the devel
RPM package instead of libndbclient-devel
.
(Bug #84580, Bug #26448330)
NDB Disk Data:
While restoring a local checkpoint, it is possible to insert a
row that already exists in the database; this is expected
behavior which is handled by deleting the existing row first,
then inserting the new copy of that row. In some cases involving
data on disk, NDB
failed to delete the
existing row.
(Bug #91627, Bug #28341843)
NDB Client Programs:
Removed a memory leak in
NdbImportUtil::RangeList
that was revealed in
ASAN builds.
(Bug #91479, Bug #28264144)
MySQL NDB ClusterJ:
When a table containing a BLOB
or
a TEXT
field was being queried
with ClusterJ for a record that did not exist, an exception
(“The method is not valid in current blob
state”) was thrown.
(Bug #28536926)
MySQL NDB ClusterJ:
A NullPointerException
was thrown when a full
table scan was performed with ClusterJ on tables containing
either a BLOB or a TEXT field. It was because the proper object
initializations were omitted, and they have now been added by
this fix.
(Bug #28199372, Bug #91242)
When copying deleted rows from a live node to a node just
starting, it is possible for one or more of these rows to have a
global checkpoint index equal to zero. If this happened at the
same time that a full local checkpoint was started due to the
undo log getting full, the LCP_SKIP
bit was
set for a row having GCI = 0, leading to an unplanned shutdown
of the data node.
(Bug #28372628)
ndbmtd sometimes experienced a hang when exiting due to log thread shutdown. (Bug #28027150)
When the SUMA
kernel block receives a
SUB_STOP_REQ
signal, it executes the signal
then replies with SUB_STOP_CONF
. (After this
response is relayed back to the API, the API is open to send
more SUB_STOP_REQ
signals.) After sending the
SUB_STOP_CONF
, SUMA drops the subscription if
no subscribers are present, which involves sending multiple
DROP_TRIG_IMPL_REQ
messages to
DBTUP
. LocalProxy can handle up to 21 of
these requests in parallel; any more than this are queued in the
Short Time Queue. When execution of a
DROP_TRIG_IMPL_REQ
was delayed, there was a
chance for the queue to become overloaded, leading to a data
node shutdown with Error in short time
queue.
This issue is fixed by delaying the execution of the
SUB_STOP_REQ
signal if
DBTUP
is already handling
DROP_TRIG_IMPL_REQ
signals at full capacity,
rather than queueing up the
DROP_TRIG_IMPL_REQ
signals.
(Bug #26574003)
Having a large number of deferred triggers could sometimes lead to job buffer exhaustion. This could occur due to the fact that a single trigger can execute many operations—for example, a foreign key parent trigger may perform operations on multiple matching child table rows—and that a row operation on a base table can execute multiple triggers. In such cases, row operations are executed in batches. When execution of many triggers was deferred—meaning that all deferred triggers are executed at pre-commit—the resulting concurrent execution of a great many trigger operations could cause the data node job buffer or send buffer to be exhausted, leading to failure of the node.
This issue is fixed by limiting the number of concurrent trigger operations as well as the number of trigger fire requests outstanding per transaction.
For immediate triggers, limiting of concurrent trigger
operations may increase the number of triggers waiting to be
executed, exhausting the trigger record pool and resulting in
the error Too many concurrently fired triggers
(increase MaxNoOfFiredTriggers. This can be avoided
by increasing
MaxNoOfFiredTriggers
,
reducing the user transaction batch size, or both.
(Bug #22529864)
References: See also: Bug #18229003, Bug #27310330.
ndbout
and ndberr
became
invalid after exiting from mgmd_run()
, and
redirecting to them before the next call to
mgmd_run()
caused a segmentation fault,
during an ndb_mgmd service restart. This fix
ensures that ndbout
and
ndberr
remain valid at all times.
(Bug #17732772, Bug #28536919)
Running out of undo log buffer memory was reported using error 921 Out of transaction memory ... (increase SharedGlobalMemory).
This problem is fixed by introducing a new error code 923 Out of undo buffer memory (increase UNDO_BUFFER_SIZE). (Bug #92125, Bug #28537319)
When moving an OperationRec
from the serial
to the parallel queue, Dbacc::startNext()
failed to update the
Operationrec::OP_ACC_LOCK_MODE
flag which is
required to reflect the accumulated
OP_LOCK_MODE
of all previous operations in
the parallel queue. This inconsistency in the ACC lock queues
caused the scan lock takeover mechanism to fail, as it
incorrectly concluded that a lock to take over was not held. The
same failure caused an assert when aborting an operation that
was a member of such an inconsistent parallel lock queue.
(Bug #92100, Bug #28530928)
A data node failed during startup due to the arrival of a
SCAN_FRAGREQ
signal during the restore phase.
This signal originated from a scan begun before the node had
previously failed and which should have been aborted due to the
involvement of the failed node in it.
(Bug #92059, Bug #28518448)
DBTUP
sent the error Tuple
corruption detected when a read operation attempted
to read the value of a tuple inserted within the same
transaction.
(Bug #92009, Bug #28500861)
References: See also: Bug #28893633.
False constraint violation errors could occur when executing updates on self-referential foreign keys. (Bug #91965, Bug #28486390)
References: See also: Bug #90644, Bug #27930382.
An NDB
internal trigger definition could be
dropped while pending instances of the trigger remained to be
executed, by attempting to look up the definition for a trigger
which had already been released. This caused unpredictable and
thus unsafe behavior possibly leading to data node failure. The
root cause of the issue lay in an invalid assumption in the code
relating to determining whether a given trigger had been
released; the issue is fixed by ensuring that the behavior of
NDB
, when a trigger definition is determined
to have been released, is consistent, and that it meets
expectations.
(Bug #91894, Bug #28451957)
In some cases, a workload that included a high number of concurrent inserts caused data node failures when using debug builds. (Bug #91764, Bug #28387450, Bug #29055038)
During an initial node restart with disk data tables present and
TwoPassInitialNodeRestartCopy
enabled, DBTUP
used an unsafe scan in disk
order. Such scans are no longer employed in this case.
(Bug #91724, Bug #28378227)
Checking for old LCP files tested the table version, but this
was not always dependable. Now, instead of relying on the table
version, the check regards as invalid any LCP file having a
maxGCI
smaller than its
createGci
.
(Bug #91637, Bug #28346565)
In certain cases, a cascade update trigger was fired repeatedly
on the same record, which eventually consumed all available
concurrent operations, leading to Error 233 Out of
operation records in transaction coordinator (increase
MaxNoOfConcurrentOperations). If
MaxNoOfConcurrentOperations
was set to a value sufficiently high to avoid this, the issue
manifested as data nodes consuming very large amounts of CPU,
very likely eventually leading to a timeout.
(Bug #91472, Bug #28262259)
Inserting a row into an NDB
table
having a self-referencing foreign key that referenced a unique
index on the table other than the primary key failed with
ER_NO_REFERENCED_ROW_2
. This was
due to the fact that NDB
checked foreign key
constraints before the unique index was updated, so that the
constraint check was unable to use the index for locating the
row. Now, in such cases, NDB
waits until all
unique index values have been updated before checking foreign
key constraints on the inserted row.
(Bug #90644, Bug #27930382)
References: See also: Bug #91965, Bug #28486390.
A connection string beginning with a slash
(/
) character is now rejected by
ndb_mgmd.
Our thanks to Daniël van Eeden for contributing this fix. (Bug #90582, Bug #27912892)