MySQL NDB Cluster 8.0 Release Notes
Performance: This release introduces a number of significant improvements in the performance of scans; these are listed here:
Row checksums help detect hardware issues, but do so at the
expense of performance. NDB
now
offers the possibility of disabling these by setting the new
ndb_row_checksum
server
system variable to 0; doing this means that row checksums
are not used for new or altered tables. This can have a
significant impact (5 to 10 percent, in some cases) on
performance for all types of queries. This variable is set
to 1 by default, to provide compatibility with the previous
behavior.
A query consisting of a scan can execute for a longer time in the LDM threads when the queue is not busy.
Previously, columns were read before checking a pushed condition; now checking of a pushed condition is done before reading any columns.
Performance of pushed joins should see significant improvement when using range scans as part of join execution.
(WL #11722)
NDB Disk Data:
NDB
now implements schema distribution of
disk data objects including tablespaces and log file groups by
SQL nodes when they connect to a cluster, just as it does for
NDB
databases and in-memory tables. This
eliminates a possible mismatch between the MySQL data dictionary
and the NDB
dictionary following a native
backup and restore that could arise when disk data tablespaces
and undo log file groups were restored to the
NDB
dictionary, but not to the MySQL Server's
data dictionary.
(WL #12172)
NDB Disk Data:
NDB
now makes use of the MySQL data
dictionary to ensure correct distribution of tablespaces and log
file groups across all cluster SQL nodes when connecting to the
cluster.
(WL #12333)
The extra metadata property for NDB
tables is
now used to store information from the MySQL data dictionary.
Because this information is significantly larger than the binary
representation previously stored here (a
.frm
file, no longer used), the hard-coded
size limit for this extra metadata has been increased.
This change can have an impact on downgrades: Trying to read
NDB
tables created in NDB 8.0.14 and later
may cause data nodes running NDB 8.0.13 or earlier to fail on
startup with NDB
error code 2355
Failure to restore schema: Permanent error, external
action needed: Resource configuration error. This
can happen if the table's metadata exceeds 6K in size, which was
the old limit. Tables created in NDB 8.0.13 and earlier can be
read by later versions without any issues.
For more information, see Changes in NDB table extra metadata, and See also MySQL Data Dictionary. (Bug #27230681, WL #10665)
Packaging:
Expected NDB header files were in the devel
RPM package instead of libndbclient-devel
.
(Bug #84580, Bug #26448330)
ndbmemcache:
libndbclient.so
was not able to find and load
libssl.so
, which could cause issues with
ndbmemcache
and Java-based programs using
NDB
.
(Bug #26824659)
References: See also: Bug #27882088, Bug #28410275.
MySQL NDB ClusterJ:
The ndb.clusterj
test for NDB 8.0.13 failed
when being run more than once. This was deal to a new, stricter
rule with NDB 8.0.13 that did not allow temporary files being
left behind in the variable folder of mysql-test-run
(mtr)
. With this fix, the temporary files are deleted
before the test is executed.
(Bug #28279038)
MySQL NDB ClusterJ:
A NullPointerException
was thrown when a full
table scan was performed with ClusterJ on tables containing
either a BLOB or a TEXT field. It was because the proper object
initializations were omitted, and they have now been added by
this fix.
(Bug #28199372, Bug #91242)
The version_comment
system
variable was not correctly configured in
mysqld binaries and returned a generic
pattern instead of the proper value. This affected all NDB
Cluster binary releases with the exception of
.deb
packages.
(Bug #29054235)
Trying to build from source using
-DWITH_NDBCLUSTER
and
-Werror
failed with GCC 8.
(Bug #28707282)
When copying deleted rows from a live node to a node just
starting, it is possible for one or more of these rows to have a
global checkpoint index equal to zero. If this happened at the
same time that a full local checkpoint was started due to the
undo log getting full, the LCP_SKIP
bit was
set for a row having GCI = 0, leading to an unplanned shutdown
of the data node.
(Bug #28372628)
ndbmtd sometimes experienced a hang when exiting due to log thread shutdown. (Bug #28027150)
NDB
has an upper limit of 128 characters for
a fully qualified table name. Due to the fact that
mysqld names NDB
tables
using the format
,
where database_name
/catalog_name
/table_name
catalog_name
is always
def
, it is possible for statements such as
CREATE TABLE
to fail in spite of
the fact that neither the table name nor the database name
exceeds the 63-character limit imposed by
NDB
. The error raised in such cases was
misleading and has been replaced.
(Bug #27769521)
References: See also: Bug #27769801.
When the SUMA
kernel block receives a
SUB_STOP_REQ
signal, it executes the signal
then replies with SUB_STOP_CONF
. (After this
response is relayed back to the API, the API is open to send
more SUB_STOP_REQ
signals.) After sending the
SUB_STOP_CONF
, SUMA drops the subscription if
no subscribers are present, which involves sending multiple
DROP_TRIG_IMPL_REQ
messages to
DBTUP
. LocalProxy can handle up to 21 of
these requests in parallel; any more than this are queued in the
Short Time Queue. When execution of a
DROP_TRIG_IMPL_REQ
was delayed, there was a
chance for the queue to become overloaded, leading to a data
node shutdown with Error in short time
queue.
This issue is fixed by delaying the execution of the
SUB_STOP_REQ
signal if
DBTUP
is already handling
DROP_TRIG_IMPL_REQ
signals at full capacity,
rather than queueing up the
DROP_TRIG_IMPL_REQ
signals.
(Bug #26574003)
ndb_restore returned -1 instead of the expected exit code in the event of an index rebuild failure. (Bug #25112726)
When starting, a data node copies metadata, while a local checkpoint updates metadata. To avoid any conflict, any ongoing LCP activity is paused while metadata is being copied. An issue arose when a local checkpoint was paused on a given node, and another node that was also restarting checked for a complete LCP on this node; the check actually caused the LCP to be completed before copying of metadata was complete and so ended the pause prematurely. Now in such cases, the LCP completion check waits to complete a paused LCP until copying of metadata is finished and the pause ends as expected, within the LCP in which it began. (Bug #24827685)
ndbout
and ndberr
became
invalid after exiting from mgmd_run()
, and
redirecting to them before the next call to
mgmd_run()
caused a segmentation fault,
during an ndb_mgmd service restart. This fix
ensures that ndbout
and
ndberr
remain valid at all times.
(Bug #17732772, Bug #28536919)
NdbScanFilter
did not always handle
NULL
according to the SQL standard, which
could result in sending non-qualifying rows to be filtered
(otherwise not necessary) by the MySQL server.
(Bug #92407, Bug #28643463)
References: See also: Bug #93977, Bug #29231709.
The internal function ndb_my_error()
was used
in ndbcluster_get_tablespace_statistics()
and
prepare_inplace_alter_table()
to report
errors when the function failed to interact with
NDB
. The function was expected to push the
NDB error as warning on the stack and then set an error by
translating the NDB error to a MySQL error and then finally call
my_error()
with the translated error. When
calling my_error()
, the function extracts a
format string that may contain placeholders and use the format
string in a function similar to sprintf()
,
which in this case could read arbitrary memory leading to a
segmentation fault, due to the fact that
my_error()
was called without any arguments.
The fix is always to push the NDB error as a warning and then
set an error with a provided message. A new helper function has
been added to Thd_ndb
to be used in place of
ndb_my_error()
.
(Bug #92244, Bug #28575934)
Running out of undo log buffer memory was reported using error 921 Out of transaction memory ... (increase SharedGlobalMemory).
This problem is fixed by introducing a new error code 923 Out of undo buffer memory (increase UNDO_BUFFER_SIZE). (Bug #92125, Bug #28537319)
When moving an OperationRec
from the serial
to the parallel queue, Dbacc::startNext()
failed to update the
Operationrec::OP_ACC_LOCK_MODE
flag which is
required to reflect the accumulated
OP_LOCK_MODE
of all previous operations in
the parallel queue. This inconsistency in the ACC lock queues
caused the scan lock takeover mechanism to fail, as it
incorrectly concluded that a lock to take over was not held. The
same failure caused an assert when aborting an operation that
was a member of such an inconsistent parallel lock queue.
(Bug #92100, Bug #28530928)
ndb_restore did not free all memory used after being called to restore a table that already existed. (Bug #92085, Bug #28525898)
A data node failed during startup due to the arrival of a
SCAN_FRAGREQ
signal during the restore phase.
This signal originated from a scan begun before the node had
previously failed and which should have been aborted due to the
involvement of the failed node in it.
(Bug #92059, Bug #28518448)
DBTUP
sent the error Tuple
corruption detected when a read operation attempted
to read the value of a tuple inserted within the same
transaction.
(Bug #92009, Bug #28500861)
References: See also: Bug #28893633.
False constraint violation errors could occur when executing updates on self-referential foreign keys. (Bug #91965, Bug #28486390)
References: See also: Bug #90644, Bug #27930382.
An NDB
internal trigger definition could be
dropped while pending instances of the trigger remained to be
executed, by attempting to look up the definition for a trigger
which had already been released. This caused unpredictable and
thus unsafe behavior possibly leading to data node failure. The
root cause of the issue lay in an invalid assumption in the code
relating to determining whether a given trigger had been
released; the issue is fixed by ensuring that the behavior of
NDB
, when a trigger definition is determined
to have been released, is consistent, and that it meets
expectations.
(Bug #91894, Bug #28451957)
In some cases, a workload that included a high number of concurrent inserts caused data node failures when using debug builds. (Bug #91764, Bug #28387450, Bug #29055038)
During an initial node restart with disk data tables present and
TwoPassInitialNodeRestartCopy
enabled, DBTUP
used an unsafe scan in disk
order. Such scans are no longer employed in this case.
(Bug #91724, Bug #28378227)
Checking for old LCP files tested the table version, but this
was not always dependable. Now, instead of relying on the table
version, the check regards as invalid any LCP file having a
maxGCI
smaller than its
createGci
.
(Bug #91637, Bug #28346565)
In certain cases, a cascade update trigger was fired repeatedly
on the same record, which eventually consumed all available
concurrent operations, leading to Error 233 Out of
operation records in transaction coordinator (increase
MaxNoOfConcurrentOperations). If
MaxNoOfConcurrentOperations
was set to a value sufficiently high to avoid this, the issue
manifested as data nodes consuming very large amounts of CPU,
very likely eventually leading to a timeout.
(Bug #91472, Bug #28262259)