33.23 Changes in MySQL NDB Cluster 8.0.14 (2019-01-21, Development Milestone)

Functionality Added or Changed

Performance: This release introduces a number of significant improvements in the performance of scans; these are listed here:
- Row checksums help detect hardware issues, but do so at the expense of performance. NDB now offers the possibility of disabling these by setting the new ndb_row_checksum server system variable to 0; doing this means that row checksums are not used for new or altered tables. This can have a significant impact (5 to 10 percent, in some cases) on performance for all types of queries. This variable is set to 1 by default, to provide compatibility with the previous behavior.
- A query consisting of a scan can execute for a longer time in the LDM threads when the queue is not busy.
- Previously, columns were read before checking a pushed condition; now checking of a pushed condition is done before reading any columns.
- Performance of pushed joins should see significant improvement when using range scans as part of join execution.
(WL #11722)
NDB Disk Data: NDB now implements schema distribution of disk data objects including tablespaces and log file groups by SQL nodes when they connect to a cluster, just as it does for NDB databases and in-memory tables. This eliminates a possible mismatch between the MySQL data dictionary and the NDB dictionary following a native backup and restore that could arise when disk data tablespaces and undo log file groups were restored to the NDB dictionary, but not to the MySQL Server's data dictionary. (WL #12172)
NDB Disk Data: NDB now makes use of the MySQL data dictionary to ensure correct distribution of tablespaces and log file groups across all cluster SQL nodes when connecting to the cluster. (WL #12333)
The extra metadata property for NDB tables is now used to store information from the MySQL data dictionary. Because this information is significantly larger than the binary representation previously stored here (a .frm file, no longer used), the hard-coded size limit for this extra metadata has been increased.
This change can have an impact on downgrades: Trying to read NDB tables created in NDB 8.0.14 and later may cause data nodes running NDB 8.0.13 or earlier to fail on startup with NDB error code 2355 Failure to restore schema: Permanent error, external action needed: Resource configuration error. This can happen if the table's metadata exceeds 6K in size, which was the old limit. Tables created in NDB 8.0.13 and earlier can be read by later versions without any issues.
For more information, see Changes in NDB table extra metadata, and See also MySQL Data Dictionary. (Bug #27230681, WL #10665)

Bugs Fixed

Packaging: Expected NDB header files were in the devel RPM package instead of libndbclient-devel. (Bug #84580, Bug #26448330)
ndbmemcache: libndbclient.so was not able to find and load libssl.so, which could cause issues with ndbmemcache and Java-based programs using NDB. (Bug #26824659)
References: See also: Bug #27882088, Bug #28410275.
MySQL NDB ClusterJ: The ndb.clusterj test for NDB 8.0.13 failed when being run more than once. This was deal to a new, stricter rule with NDB 8.0.13 that did not allow temporary files being left behind in the variable folder of mysql-test-run (mtr). With this fix, the temporary files are deleted before the test is executed. (Bug #28279038)
MySQL NDB ClusterJ: A NullPointerException was thrown when a full table scan was performed with ClusterJ on tables containing either a BLOB or a TEXT field. It was because the proper object initializations were omitted, and they have now been added by this fix. (Bug #28199372, Bug #91242)
The version_comment system variable was not correctly configured in mysqld binaries and returned a generic pattern instead of the proper value. This affected all NDB Cluster binary releases with the exception of .deb packages. (Bug #29054235)
Trying to build from source using -DWITH_NDBCLUSTER and -Werror failed with GCC 8. (Bug #28707282)
When copying deleted rows from a live node to a node just starting, it is possible for one or more of these rows to have a global checkpoint index equal to zero. If this happened at the same time that a full local checkpoint was started due to the undo log getting full, the LCP_SKIP bit was set for a row having GCI = 0, leading to an unplanned shutdown of the data node. (Bug #28372628)
ndbmtd sometimes experienced a hang when exiting due to log thread shutdown. (Bug #28027150)
NDB has an upper limit of 128 characters for a fully qualified table name. Due to the fact that mysqld names NDB tables using the format database_name/catalog_name/table_name, where catalog_name is always def, it is possible for statements such as CREATE TABLE to fail in spite of the fact that neither the table name nor the database name exceeds the 63-character limit imposed by NDB. The error raised in such cases was misleading and has been replaced. (Bug #27769521)
References: See also: Bug #27769801.
When the SUMA kernel block receives a SUB_STOP_REQ signal, it executes the signal then replies with SUB_STOP_CONF. (After this response is relayed back to the API, the API is open to send more SUB_STOP_REQ signals.) After sending the SUB_STOP_CONF, SUMA drops the subscription if no subscribers are present, which involves sending multiple DROP_TRIG_IMPL_REQ messages to DBTUP. LocalProxy can handle up to 21 of these requests in parallel; any more than this are queued in the Short Time Queue. When execution of a DROP_TRIG_IMPL_REQ was delayed, there was a chance for the queue to become overloaded, leading to a data node shutdown with Error in short time queue.
This issue is fixed by delaying the execution of the SUB_STOP_REQ signal if DBTUP is already handling DROP_TRIG_IMPL_REQ signals at full capacity, rather than queueing up the DROP_TRIG_IMPL_REQ signals. (Bug #26574003)
ndb_restore returned -1 instead of the expected exit code in the event of an index rebuild failure. (Bug #25112726)
When starting, a data node copies metadata, while a local checkpoint updates metadata. To avoid any conflict, any ongoing LCP activity is paused while metadata is being copied. An issue arose when a local checkpoint was paused on a given node, and another node that was also restarting checked for a complete LCP on this node; the check actually caused the LCP to be completed before copying of metadata was complete and so ended the pause prematurely. Now in such cases, the LCP completion check waits to complete a paused LCP until copying of metadata is finished and the pause ends as expected, within the LCP in which it began. (Bug #24827685)
ndbout and ndberr became invalid after exiting from mgmd_run(), and redirecting to them before the next call to mgmd_run() caused a segmentation fault, during an ndb_mgmd service restart. This fix ensures that ndbout and ndberr remain valid at all times. (Bug #17732772, Bug #28536919)
NdbScanFilter did not always handle NULL according to the SQL standard, which could result in sending non-qualifying rows to be filtered (otherwise not necessary) by the MySQL server. (Bug #92407, Bug #28643463)
References: See also: Bug #93977, Bug #29231709.
The internal function ndb_my_error() was used in ndbcluster_get_tablespace_statistics() and prepare_inplace_alter_table() to report errors when the function failed to interact with NDB. The function was expected to push the NDB error as warning on the stack and then set an error by translating the NDB error to a MySQL error and then finally call my_error() with the translated error. When calling my_error(), the function extracts a format string that may contain placeholders and use the format string in a function similar to sprintf(), which in this case could read arbitrary memory leading to a segmentation fault, due to the fact that my_error() was called without any arguments.
The fix is always to push the NDB error as a warning and then set an error with a provided message. A new helper function has been added to Thd_ndb to be used in place of ndb_my_error(). (Bug #92244, Bug #28575934)
Running out of undo log buffer memory was reported using error 921 Out of transaction memory ... (increase SharedGlobalMemory).
This problem is fixed by introducing a new error code 923 Out of undo buffer memory (increase UNDO_BUFFER_SIZE). (Bug #92125, Bug #28537319)
When moving an OperationRec from the serial to the parallel queue, Dbacc::startNext() failed to update the Operationrec::OP_ACC_LOCK_MODE flag which is required to reflect the accumulated OP_LOCK_MODE of all previous operations in the parallel queue. This inconsistency in the ACC lock queues caused the scan lock takeover mechanism to fail, as it incorrectly concluded that a lock to take over was not held. The same failure caused an assert when aborting an operation that was a member of such an inconsistent parallel lock queue. (Bug #92100, Bug #28530928)
ndb_restore did not free all memory used after being called to restore a table that already existed. (Bug #92085, Bug #28525898)
A data node failed during startup due to the arrival of a SCAN_FRAGREQ signal during the restore phase. This signal originated from a scan begun before the node had previously failed and which should have been aborted due to the involvement of the failed node in it. (Bug #92059, Bug #28518448)
DBTUP sent the error Tuple corruption detected when a read operation attempted to read the value of a tuple inserted within the same transaction. (Bug #92009, Bug #28500861)
References: See also: Bug #28893633.
False constraint violation errors could occur when executing updates on self-referential foreign keys. (Bug #91965, Bug #28486390)
References: See also: Bug #90644, Bug #27930382.
An NDB internal trigger definition could be dropped while pending instances of the trigger remained to be executed, by attempting to look up the definition for a trigger which had already been released. This caused unpredictable and thus unsafe behavior possibly leading to data node failure. The root cause of the issue lay in an invalid assumption in the code relating to determining whether a given trigger had been released; the issue is fixed by ensuring that the behavior of NDB, when a trigger definition is determined to have been released, is consistent, and that it meets expectations. (Bug #91894, Bug #28451957)
In some cases, a workload that included a high number of concurrent inserts caused data node failures when using debug builds. (Bug #91764, Bug #28387450, Bug #29055038)
During an initial node restart with disk data tables present and TwoPassInitialNodeRestartCopy enabled, DBTUP used an unsafe scan in disk order. Such scans are no longer employed in this case. (Bug #91724, Bug #28378227)
Checking for old LCP files tested the table version, but this was not always dependable. Now, instead of relying on the table version, the check regards as invalid any LCP file having a maxGCI smaller than its createGci. (Bug #91637, Bug #28346565)
In certain cases, a cascade update trigger was fired repeatedly on the same record, which eventually consumed all available concurrent operations, leading to Error 233 Out of operation records in transaction coordinator (increase MaxNoOfConcurrentOperations). If MaxNoOfConcurrentOperations was set to a value sufficiently high to avoid this, the issue manifested as data nodes consuming very large amounts of CPU, very likely eventually leading to a timeout. (Bug #91472, Bug #28262259)