31 Changes in MySQL NDB Cluster 7.6.6 (5.7.22-ndb-7.6.6) (2018-05-31, General Availability)

MySQL NDB Cluster 7.6.6 is a new release of NDB 7.6, based on MySQL Server 5.7 and including features in version 7.6 of the NDB storage engine, as well as fixing recently discovered bugs in previous NDB Cluster releases.

Obtaining NDB Cluster 7.6. NDB Cluster 7.6 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.

For an overview of changes made in NDB Cluster 7.6, see What is New in NDB Cluster 7.6.

This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 5.7 through MySQL 5.7.22 (see Changes in MySQL 5.7.22 (2018-04-19, General Availability)).

Functionality Added or Changed

When performing an NDB backup, the ndbinfo.logbuffers table now displays information regarding buffer usage by the backup process on each data node. This is implemented as rows reflecting two new log types in addition to REDO and DD-UNDO. One of these rows has the log type BACKUP-DATA, which shows the amount of data buffer used during backup to copy fragments to backup files. The other row has the log type BACKUP-LOG, which displays the amount of log buffer used during the backup to record changes made after the backup has started. One each of these log_type rows is shown in the logbuffers table for each data node in the cluster. Rows having these two log types are present in the table only while an NDB backup is currently in progress. (Bug #25822988)
Added the --logbuffer-size option for ndbd and ndbmtd, for use in debugging with a large number of log messages. This controls the size of the data node log buffer; the default (32K) is intended for normal operations. (Bug #89679, Bug #27550943)
The previously experimental shared memory (SHM) transporter is now supported in production. SHM works by transporting signals through writing them into memory, rather than on a socket. NDB already attempts to use SHM automatically between a data node and an API node sharing the same host. To enable explicit shared memory connections, set the UseShm configuration parameter to 1 for the relevant data node. When explicitly defining shared memory as the connection method, it is also necessary that the data node is identified by HostName and the API node by HostName.
Additional tuning parameters such as ShmSize, ShmSpintime, and SendBufferMemory can be employed to improve performance of the SHM transporter. Configuration of SHM is otherwise similar to that of the TCP transporter. The SigNum parameter is no longer used, and any settings made for it are now ignored. NDB Cluster Shared Memory Connections, provides more information about these parameters.
In addition, as part of this work, NDB code relating to support for the legacy SCI transporter, which had long been unsupported, has been removed. See www.dolphinics.com for information about support for legacy SCI hardware or information about the newer Dolphin Express hardware. (WL #7512)
The SPJ kernel block now takes into account when it is evaluating a join request in which at least some of the tables are used in inner joins. This means that SPJ can eliminate requests for rows or ranges as soon as it becomes known that a preceding request did not return any results for a parent row. This saves both the data nodes and the SPJ block from having to handle requests and result rows which never take part in a result row from an inner join.
Note
When upgrading from NDB 7.6.5 or earlier, you should be aware that this optimization depends on both API client and data node functionality, and so is not available until all of these have been upgraded.
(WL #11164)
The poll receiver which NDB uses to read from sockets, execute messages from the sockets, and wake up other threads now offloads wakeup of other threads to a new thread that wakes up the other threads on request, and otherwise simply sleeps. This improves the scalability of a single cluster connection by keeping the receive thread from becoming overburdened by tasks including wakeup of other threads. (WL #9663)

Bugs Fixed

Important Change; NDB Client Programs: ndb_top ignored short forms of command-line options, and did not in all cases handle misformed long options correctly. As part of the fix for these issues, the following changes have been made to command-line options used with ndb_top to bring them more into line with those used with other NDB Cluster and MySQL programs:
- The --passwd option is removed, and replaced by --password (short form -p).
- The short form -t for the --port option has been replaced by -P.
- The short form -x for the --text option has been replaced by -t.
(Bug #26907833)
References: See also: Bug #88236, Bug #20733646.
NDB Cluster APIs: A previous fix for an issue, in which the failure of multiple data nodes during a partial restart could cause API nodes to fail, did not properly check the validity of the associated NdbReceiver object before proceeding. Now in such cases an invalid object triggers handling for invalid signals, rather than a node failure. (Bug #25902137)
References: This issue is a regression of: Bug #25092498.
NDB Cluster APIs: Incorrect results, usually an empty result set, were returned when setBound() was used to specify a NULL bound. This issue appears to have been caused by a problem in gcc, limited to cases using the old version of this method (which does not employ NdbRecord), and is fixed by rewriting the problematic internal logic in the old implementation. (Bug #89468, Bug #27461752)
NDB Cluster APIs: Released NDB API objects are kept in one or more Ndb_free_list structures for later reuse. Each list also keeps track of all objects seized from it, and makes sure that these are eventually released back to it. In the event that the internal function NdbScanOperation::init() failed, it was possible for an NdbApiSignal already allocated by the NdbOperation to be leaked. Now in such cases, NdbScanOperation::release() is called to release any objects allocated by the failed NdbScanOperation before it is returned to the free list.
This fix also handles a similar issue with NdbOperation::init(), where a failed call could also leak a signal. (Bug #89249, Bug #27389894)
NDB Client Programs: ndb_top did not support a number of options common to most NDB programs. The following options are now supported:
In addition, ndb_top now supports a --socket option (short form -S) for specifying a socket file to use for the connection. (Bug #86614, Bug #26236298)
MySQL NDB ClusterJ: ClusterJ quit unexpectedly as there was no error handling in the scanIndex() function of the ClusterTransactionImpl class for a null returned to it internally by the scanIndex() method of the ndbTransaction class. (Bug #27297681, Bug #88989)
In some circumstances, when a transaction was aborted in the DBTC block, there remained links to trigger records from operation records which were not yet reference-counted, but when such an operation record was released the trigger reference count was still decremented. (Bug #27629680)
An NDB online backup consists of data, which is fuzzy, and a redo and undo log. To restore to a consistent state it is necessary to ensure that the log contains all of the changes spanning the capture of the fuzzy data portion and beyond to a consistent snapshot point. This is achieved by waiting for a GCI boundary to be passed after the capture of data is complete, but before stopping change logging and recording the stop GCI in the backup's metadata.
At restore time, the log is replayed up to the stop GCI, restoring the system to the state it had at the consistent stop GCI. A problem arose when, under load, it was possible to select a GCI boundary which occurred too early and did not span all the data captured. This could lead to inconsistencies when restoring the backup; these could be noticed as broken constraints or corrupted BLOB entries.
Now the stop GCI is chosen is so that it spans the entire duration of the fuzzy data capture process, so that the backup log always contains all data within a given stop GCI. (Bug #27497461)
References: See also: Bug #27566346.
For NDB tables, when a foreign key was added or dropped as a part of a DDL statement, the foreign key metatdata for all parent tables referenced should be reloaded in the handler on all SQL nodes connected to the cluster, but this was done only on the mysqld on which the statement was executed. Due to this, any subsequent queries relying on foreign key metadata from the corresponding parent tables could return inconsistent results. (Bug #27439587)
References: See also: Bug #82989, Bug #24666177.
ANALYZE TABLE used excessive amounts of CPU on large, low-cardinality tables. (Bug #27438963)
Queries using very large lists with IN were not handled correctly, which could lead to data node failures. (Bug #27397802)
References: See also: Bug #28728603.
A data node overload could in some situations lead to an unplanned shutdown of the data node, which led to all data nodes disconnecting from the management and nodes.
This was due to a situation in which API_FAILREQ was not the last received signal prior to the node failure.
As part of this fix, the transaction coordinator's handling of SCAN_TABREQ signals for an ApiConnectRecord in an incorrect state was also improved. (Bug #27381901)
References: See also: Bug #47039, Bug #11755287.
In a two-node cluster, when the node having the lowest ID was started using --nostart, API clients could not connect, failing with Could not alloc node id at HOST port PORT_NO: No free node id found for mysqld(API). (Bug #27225212)
Changing MaxNoOfExecutionThreads without an initial system restart led to an unplanned data node shutdown. (Bug #27169282)
References: This issue is a regression of: Bug #26908347, Bug #26968613.
Race conditions sometimes occurred during asynchronous disconnection and reconnection of the transporter while other threads concurrently inserted signal data into the send buffers, leading to an unplanned shutdown of the cluster.
As part of the work fixing this issue, the internal templating function used by the Transporter Registry when it prepares a send is refactored to use likely-or-unlikely logic to speed up execution, and to remove a number of duplicate checks for NULL. (Bug #24444908, Bug #25128512)
References: See also: Bug #20112700.
ndb_restore sometimes logged data file and log file progress values much greater than 100%. (Bug #20989106)
The internal function BitmaskImpl::setRange() set one bit fewer than specified. (Bug #90648, Bug #27931995)
It was not possible to create an NDB table using PARTITION_BALANCE set to FOR_RA_BY_LDM_X_2, FOR_RA_BY_LDM_X_3, or FOR_RA_BY_LDM_X_4. (Bug #89811, Bug #27602352)
References: This issue is a regression of: Bug #81759, Bug #23544301.
Adding a [tcp] or [shm] section to the global configuration file for a cluster with multiple data nodes caused default TCP connections to be lost to the node using the single section. (Bug #89627, Bug #27532407)
As a result of the reuse of code intended for send threads when performing an assist send, all of the local release send buffers were released to the global pool, which caused the intended level of the local send buffer pool to be ignored. Now send threads and assisting worker threads follow their own policies for maintaining their local buffer pools. (Bug #89119, Bug #27349118)
When sending priority A signals, we now ensure that the number of pending signals is explicitly initialized. (Bug #88986, Bug #27294856)
In a MySQL Cluster with one MySQL Server configured to write a binary log failure occurred when creating and using an NDB table with non-stored generated columns. The problem arose only when the product was built with debugging support. (Bug #86084, Bug #25957586)
ndb_restore --print-data --hex did not print trailing 0s of LONGVARBINARY values. (Bug #65560, Bug #14198580)
When the internal function ha_ndbcluster::copy_fk_for_offline_alter() checked dependent objects on a table from which it was supposed to drop a foreign key, it did not perform any filtering for foreign keys, making it possible for it to attempt retrieval of an index or trigger instead, leading to a spurious Error 723 (No such table).