33 Changes in MySQL NDB Cluster 7.6.4 (5.7.20-ndb-7.6.4) (2018-01-31, Development Milestone 4)

MySQL NDB Cluster 7.6.4 is a new release of NDB 7.6, based on MySQL Server 5.7 and including features in version 7.6 of the NDB storage engine, as well as fixing recently discovered bugs in previous NDB Cluster releases.

Obtaining NDB Cluster 7.6. NDB Cluster 7.6 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.

For an overview of changes made in NDB Cluster 7.6, see What is New in NDB Cluster 7.6.

This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 5.7 through MySQL 5.7.20 (see Changes in MySQL 5.7.20 (2017-10-16, General Availability)).

Functionality Added or Changed

Incompatible Change; NDB Disk Data: Due to changes in disk file formats, it is necessary to perform an --initial restart of each data node when upgrading to or downgrading from this release.
Important Change; NDB Disk Data: NDB Cluster has improved node restart times and overall performance with larger data sets by implementing partial local checkpoints (LCPs). Prior to this release, an LCP always made a copy of the entire database.
NDB now supports LCPs that write individual records, so it is no longer strictly necessary for an LCP to write the entire database. Since, at recovery, it remains necessary to restore the database fully, the strategy is to save one fourth of all records at each LCP, as well as to write the records that have changed since the last LCP.
Two data node configuration parameters relating to this change are introduced in this release: EnablePartialLcp (default true, or enabled) enables partial LCPs. When partial LCPs are enabled, RecoveryWork controls the percentage of space given over to LCPs; it increases with the amount of work which must be performed on LCPs during restarts as opposed to that performed during normal operations. Raising this value causes LCPs during normal operations to require writing fewer records and so decreases the usual workload. Raising this value also means that restarts can take longer.
Important
Upgrading to NDB 7.6.4 or downgrading from this release requires purging then re-creating the NDB data node file system, which means that an initial restart of each data node is needed. An initial node restart still requires a complete LCP; a partial LCP is not used for this purpose.
A rolling restart or system restart is a normal part of an NDB software upgrade. When such a restart is performed as part of an upgrade to NDB 7.6.4 or later, any existing LCP files are checked for the presence of the LCP sysfile, indicating that the existing data node file system was written using NDB 7.6.4 or later. If such a node file system exists, but does not contain the sysfile, and if any data nodes are restarted without the --initial option, NDB causes the restart to fail with an appropriate error message. This detection can be performed only as part of an upgrade; it is not possible to do so as part of a downgrade to NDB 7.6.3 or earlier from a later release.
Exception: If there are no data node files—that is, in the event of a “clean” start or restart—using --initial is not required for a software upgrade, since this is already equivalent to an initial restart. (This aspect of restarts is unchanged from previous releases of NDB Cluster.)
In addition, the default value for StartPartitionedTimeout is changed from 60000 to 0.
This release also deprecates the data node configuration parameters BackupDataBufferSize, BackupWriteSize, and BackupMaxWriteSize; these are now subject to removal in a future NDB Cluster version. (Bug #27308632, WL #8069, WL #10302, WL #10993)
Important Change: Added the ndb_perror utility for obtaining information about NDB Cluster error codes. This tool replaces perror --ndb; the --ndb option for perror is now deprecated and raises a warning when used; the option is subject to removal in a future NDB version.
See ndb_perror — Obtain NDB Error Message Information, for more information. (Bug #81703, Bug #81704, Bug #23523869, Bug #23523926)
References: See also: Bug #26966826, Bug #88086.
NDB Client Programs: NDB Cluster Auto-Installer node configuration parameters as supported in the UI and accompanying documentation were in some cases hard coded to an arbitrary value, or were missing altogether. Configuration parameters, their default values, and the documentation have been better aligned with those found in release versions of the NDB Cluster software.
One necessary addition to this task was implementing the mechanism which the Auto-Installer now provides for setting parameters that take discrete values. For example, the value of the data node parameter Arbitration must now be one of Default, Disabled, or WaitExternal.
The Auto-Installer also now gets and uses the amount of disk space available to NDB on each host for deriving reasonable default values for configuration parameters which depend on this value.
See The NDB Cluster Auto-Installer (NDB 7.5) (NO LONGER SUPPORTED), for more information. (WL #10340, WL #10408, WL #10449)
NDB Client Programs: Secure connection support in the MySQL NDB Cluster Auto-Installer has been updated or improved in this release as follows:
- Added a mechanism for setting SSH membership on a per-host basis.
- Updated the Paramiko Python module to the most recent available version (2.6.1).
- Provided a place in the GUI for encrypted private key passwords, and discontinued use of hardcoded passwords.
Related enhancements implemented in the current release include the following:
- Discontinued use of cookies as a persistent store for NDB Cluster configuration information; these were not secure and came with a hard upper limit on storage. Now the Auto-Installer uses an encrypted file for this purpose.
- In order to secure data transfer between the web browser front end and the back end web server, the default communications protocol has been switched from HTTP to HTTPS.
See The NDB Cluster Auto-Installer (NDB 7.5) (NO LONGER SUPPORTED), for more information. (WL #10426, WL #11128, WL #11289)
MySQL NDB ClusterJ: ClusterJ now supports CPU binding for receive threads through the setRecvThreadCPUids() and getRecvThreadCPUids() methods. Also, the receive thread activation threshold can be set and get with the setRecvThreadActivationThreshold() and getRecvThreadActivationThreshold() methods. (WL #10815)
It is now possible to specify a set of cores to be used for I/O threads performing offline multithreaded builds of ordered indexes, as opposed to normal I/O duties such as file I/O， compression， or decompression. “Offline” in this context refers to building of ordered indexes performed when the parent table is not being written to; such building takes place when an NDB cluster performs a node or system restart, or as part of restoring a cluster from backup using ndb_restore --rebuild-indexes.
In addition, the default behaviour for offline index build work is modified to use all cores available to ndbmtd, rather limiting itself to the core reserved for the I/O thread. Doing so can improve restart and restore times and performance, availability, and the user experience.
This enhancement is implemented as follows:
1. The default value for BuildIndexThreads is changed from 0 to 128. This means that offline ordered index builds are now multithreaded by default.
2. The default value for TwoPassInitialNodeRestartCopy is changed from false to true. This means that an initial node restart first copies all data from a “live” node to one that is starting—without creating any indexes—builds ordered indexes offline, and then again synchronizes its data with the live node, that is, synchronizing twice and building indexes offline between the two synchonizations. This causes an initial node restart to behave more like the normal restart of a node, and reduces the time required for building indexes.
3. A new thread type (idxbld) is defined for the ThreadConfig configuration parameter, to allow locking of offline index build threads to specific CPUs.
In addition, NDB now distinguishes the thread types that are accessible to “ThreadConfig” by the following two criteria:
1. Whether the thread is an execution thread. Threads of types main, ldm, recv, rep, tc, and send are execution threads; thread types io, watchdog, and idxbld are not.
2. Whether the allocation of the thread to a given task is permanent or temporary. Currently all thread types except idxbld are permanent.
For additonal information, see the descriptions of the parameters in the Manual. (Bug #25835748, Bug #26928111)
Added the ODirectSyncFlag configuration parameter for data nodes. When enabled, the data node treats all completed filesystem writes to the redo log as though they had been performed using fsync.
Note
This parameter has no effect if at least one of the following conditions is true:
- ODirect is not enabled.
- InitFragmentLogFiles is set to SPARSE.
(Bug #25428560)
Added the ndbinfo.error_messages table, which provides information about NDB Cluster errors, including error codes, status types, brief descriptions, and classifications. This makes it possible to obtain error information using SQL in the mysql client (or other MySQL client program), like this:
```
mysql> SELECT * FROM ndbinfo.error_messages WHERE error_code='321';
+------------+----------------------+-----------------+----------------------+
| error_code | error_description    | error_status    | error_classification |
+------------+----------------------+-----------------+----------------------+
|        321 | Invalid nodegroup id | Permanent error | Application error    |
+------------+----------------------+-----------------+----------------------+
1 row in set (0.00 sec)
```
The query just shown provides equivalent information to that obtained by issuing ndb_perror 321 or (now deprecated) perror --ndb 321 on the command line. (Bug #86295, Bug #26048272)
ThreadConfig now has an additional nosend parameter that can be used to prevent a main, ldm, rep, or tc thread from assisting the send threads, by setting this parameter to 1 for the given thread. By default, nosend is 0. It cannot be used with threads other than those of the types just listed. (WL #11554)
When executing a scan as a pushed join, all instances of DBSPJ were involved in the execution of a single query; some of these received multiple requests from the same query. This situation is improved by enabling a single SPJ request to handle a set of root fragments to be scanned, such that only a single SPJ request is sent to each DBSPJ instance on each node and batch sizes are allocated per fragment, the multi-fragment scan can obtain a larger total batch size, allowing for some scheduling optimizations to be done within DBSPJ, which can scan a single fragment at a time (giving it the total batch size allocation), scan all fragments in parallel using smaller sub-batches, or some combination of the two.
Since the effect of this change is generally to require fewer SPJ requests and instances, performance of pushed-down joins should be improved in many cases. (WL #10234)
As part of work ongoing to optimize bulk DDL performance by ndbmtd, it is now possible to obtain performance improvements by increasing the batch size for the bulk data parts of DDL operations which process all of the data in a fragment or set of fragments using a scan. Batch sizes are now made configurable for unique index builds, foreign key builds, and online reorganization, by setting the respective data node configuration parameters listed here:
- MaxFKBuildBatchSize: Maximum scan batch size used for building foreign keys.
- MaxReorgBuildBatchSize: Maximum scan batch size used for reorganization of table partitions.
- MaxUIBuildBatchSize: Maximum scan batch size used for building unique keys.
For each of the parameters just listed, the default value is 64, the minimum is 16, and the maximum is 512.
Increasing the appropriate batch size or sizes can help amortize inter-thread and inter-node latencies and make use of more parallel resources (local and remote) to help scale DDL performance. (WL #11158)
Formerly, the data node LGMAN kernel block processed undo log records serially; now this is done in parallel. The rep thread, which hands off undo records to local data handler (LDM) threads, waited for an LDM to finish applying a record before fetching the next one; now the rep thread no longer waits, but proceeds immediately to the next record and LDM.
There are no user-visible changes in functionality directly associated with this work; this performance enhancement is part of the work being done in NDB 7.6 to improve undo long handling for partial local checkpoints. (WL #8478)
When applying an undo log the table ID and fragment ID are obtained from the page ID. This was done by reading the page from PGMAN using an extra PGMAN worker thread, but when applying the undo log it was necessary to read the page again.
This became very inefficient when using O_DIRECT (see ODirect) since the page was not cached in the OS kernel.
Mapping from page ID to table ID and fragment ID is now done using information the extent header contains about the table IDs and fragment IDs of the pages used in a given extent. Since the extent pages are always present in the page cache, no extra disk reads are required to perform the mapping, and the information can be read using existing TSMAN data structures. (WL #10194)
Added the NODELOG DEBUG command in the ndb_mgm client to provide runtime control over data node debug logging. NODE DEBUG ON causes a data node to write extra debugging information to its node log, the same as if the node had been started with --verbose. NODELOG DEBUG OFF disables the extra logging. (WL #11216)
Added the LocationDomainId configuration parameter for management, data, and API nodes. When using NDB Cluster in a cloud environment, you can set this parameter to assign a node to a given availability domain or availability zone. This can improve performance in the following ways:
- If requested data is not found on the same node, reads can be directed to another node in the same availability domain.
- Communication between nodes in different availability domains are guaranteed to use NDB transporters' WAN support without any further manual intervention.
- The transporter's group number can be based on which availability domain is used, such that also SQL and other API nodes communicate with local data nodes in the same availability domain whenever possible.
- The arbitrator can be selected from an availability domain in which no data nodes are present, or, if no such availability domain can be found, from a third availability domain.
This parameter takes an integer value between 0 and 16, with 0 being the default; using 0 is the same as leaving LocationDomainId unset. (WL #10172)

Bugs Fixed

Important Change: The --passwd option for ndb_top is now deprecated. It is removed (and replaced with --password) in NDB 7.6.5. (Bug #88236, Bug #20733646)
References: See also: Bug #86615, Bug #26236320, Bug #26907833.
Replication: With GTIDs generated for incident log events, MySQL error code 1590 (ER_SLAVE_INCIDENT) could not be skipped using the --slave-skip-errors=1590 startup option on a replication slave. (Bug #26266758)
NDB Disk Data: An ALTER TABLE that switched the table storage format between MEMORY and DISK was always performed in place for all columns. This is not correct in the case of a column whose storage format is inherited from the table; the column's storage type is not changed.
For example, this statement creates a table t1 whose column c2 uses in-memory storage since the table does so implicitly:
```
CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT) ENGINE NDB;
```
The ALTER TABLE statement shown here is expected to cause c2 to be stored on disk, but failed to do so:
```
ALTER TABLE t1 STORAGE DISK TABLESPACE ts1;
```
Similarly, an on-disk column that inherited its storage format from the table to which it belonged did not have the format changed by ALTER TABLE ... STORAGE MEMORY.
These two cases are now performed as a copying alter, and the storage format of the affected column is now changed. (Bug #26764270)
NDB Replication: On an SQL node not being used for a replication channel with sql_log_bin=0 it was possible after creating and populating an NDB table for a table map event to be written to the binary log for the created table with no corresponding row events. This led to problems when this log was later used by a slave cluster replicating from the mysqld where this table was created.
Fixed this by adding support for maintaining a cumulative any_value bitmap for global checkpoint event operations that represents bits set consistently for all rows of a specific table in a given epoch, and by adding a check to determine whether all operations (rows) for a specific table are all marked as NOLOGGING, to prevent the addition of this table to the Table_map held by the binlog injector.
As part of this fix, the NDB API adds a new getNextEventOpInEpoch3() method which provides information about any AnyValue received by making it possible to retrieve the cumulative any_value bitmap. (Bug #26333981)
ndbinfo Information Database: Counts of committed rows and committed operations per fragment used by some tables in ndbinfo were taken from the DBACC block, but due to the fact that commit signals can arrive out of order, transient counter values could be negative. This could happen if, for example, a transaction contained several interleaved insert and delete operations on the same row; in such cases, commit signals for delete operations could arrive before those for the corresponding insert operations, leading to a failure in DBACC.
This issue is fixed by using the counts of committed rows which are kept in DBTUP, which do not have this problem. (Bug #88087, Bug #26968613)
Errors in parsing NDB_TABLE modifiers could cause memory leaks. (Bug #26724559)
Added DUMP code 7027 to facilitate testing of issues relating to local checkpoints. For more information, see DUMP 7027. (Bug #26661468)
A previous fix intended to improve logging of node failure handling in the transaction coordinator included logging of transactions that could occur in normal operation, which made the resulting logs needlessly verbose. Such normal transactions are no longer written to the log in such cases. (Bug #26568782)
References: This issue is a regression of: Bug #26364729.
Due to a configuration file error, CPU locking capability was not available on builds for Linux platforms. (Bug #26378589)
Some DUMP codes used for the LGMAN kernel block were incorrectly assigned numbers in the range used for codes belonging to DBTUX. These have now been assigned symbolic constants and numbers in the proper range (10001, 10002, and 10003). (Bug #26365433)
Node failure handling in the DBTC kernel block consists of a number of tasks which execute concurrently, and all of which must complete before TC node failure handling is complete. This fix extends logging coverage to record when each task completes, and which tasks remain, includes the following improvements:
- Handling interactions between GCP and node failure handling interactions, in which TC takeover causes GCP participant stall at the master TC to allow it to extend the current GCI with any transactions that were taken over; the stall can begin and end in different GCP protocol states. Logging coverage is extended to cover all scenarios. Debug logging is now more consistent and understandable to users.
- Logging done by the QMGR block as it monitors duration of node failure handling duration is done more frequently. A warning log is now generated every 30 seconds (instead of 1 minute), and this now includes DBDIH block debug information (formerly this was written separately, and less often).
- To reduce space used, DBTC instance number: is shortened to DBTC number:.
- A new error code is added to assist testing.
(Bug #26364729)
During a restart, DBLQH loads redo log part metadata for each redo log part it manages, from one or more redo log files. Since each file has a limited capacity for metadata, the number of files which must be consulted depends on the size of the redo log part. These files are opened, read, and closed sequentially, but the closing of one file occurs concurrently with the opening of the next.
In cases where closing of the file was slow, it was possible for more than 4 files per redo log part to be open concurrently; since these files were opened using the OM_WRITE_BUFFER option, more than 4 chunks of write buffer were allocated per part in such cases. The write buffer pool is not unlimited; if all redo log parts were in a similar state, the pool was exhausted, causing the data node to shut down.
This issue is resolved by avoiding the use of OM_WRITE_BUFFER during metadata reload, so that any transient opening of more than 4 redo log files per log file part no longer leads to failure of the data node. (Bug #25965370)
Following TRUNCATE TABLE on an NDB table, its AUTO_INCREMENT ID was not reset on an SQL node not performing binary logging. (Bug #14845851)
A join entirely within the materialized part of a semijoin was not pushed even if it could have been. In addition, EXPLAIN provided no information about why the join was not pushed. (Bug #88224, Bug #27022925)
References: See also: Bug #27067538.
When the duplicate weedout algorithm was used for evaluating a semijoin, the result had missing rows. (Bug #88117, Bug #26984919)
References: See also: Bug #87992, Bug #26926666.
A table used in a loose scan could be used as a child in a pushed join query, leading to possibly incorrect results. (Bug #87992, Bug #26926666)
When representing a materialized semijoin in the query plan, the MySQL Optimizer inserted extra QEP_TAB and JOIN_TAB objects to represent access to the materialized subquery result. The join pushdown analyzer did not properly set up its internal data structures for these, leaving them uninitialized instead. This meant that later usage of any item objects referencing the materialized semijoin accessed an initialized tableno column when accessing a 64-bit tableno bitmask, possibly referring to a point beyond its end, leading to an unplanned shutdown of the SQL node. (Bug #87971, Bug #26919289)
In some cases, a SCAN_FRAGCONF signal was received after a SCAN_FRAGREQ with a close flag had already been sent, clearing the timer. When this occurred, the next SCAN_FRAGREF to arrive caused time tracking to fail. Now in such cases, a check for a cleared timer is performed prior to processing the SCAN_FRAGREF message. (Bug #87942, Bug #26908347)
While deleting an element in Dbacc, or moving it during hash table expansion or reduction, the method used (getLastAndRemove()) could return a reference to a removed element on a released page, which could later be referenced from the functions calling it. This was due to a change brought about by the implementation of dynamic index memory in NDB 7.6.2; previously, the page had always belonged to a single Dbacc instance, so accessing it was safe. This was no longer the case following the change; a page released in Dbacc could be placed directly into the global page pool where any other thread could then allocate it.
Now we make sure that newly released pages in Dbacc are kept within the current Dbacc instance and not given over directly to the global page pool. In addition, the reference to a released page has been removed; the affected internal method now returns the last element by value, rather than by reference. (Bug #87932, Bug #26906640)
References: See also: Bug #87987, Bug #26925595.
The DBTC kernel block could receive a TCRELEASEREQ signal in a state for which it was unprepared. Now it such cases it responds with a TCRELEASECONF message, and subsequently behaves just as if the API connection had failed. (Bug #87838, Bug #26847666)
References: See also: Bug #20981491.
When a data node was configured for locking threads to CPUs, it failed during startup with Failed to lock tid.
This was is a side effect of a fix for a previous issue, which disabled CPU locking based on the version of the available glibc. The specific glibc issue being guarded against is encountered only in response to an internal NDB API call (Ndb_UnlockCPU()) not used by data nodes (and which can be accessed only through internal API calls). The current fix enables CPU locking for data nodes and disables it only for the relevant API calls when an affected glibc version is used. (Bug #87683, Bug #26758939)
References: This issue is a regression of: Bug #86892, Bug #26378589.
ndb_top failed to build on platforms where the ncurses library did not define stdscr. Now these platforms require the tinfo library to be included. (Bug #87185, Bug #26524441)
On completion of a local checkpoint, every node sends a LCP_COMPLETE_REP signal to every other node in the cluster; a node does not consider the LCP complete until it has been notified that all other nodes have sent this signal. Due to a minor flaw in the LCP protocol, if this message was delayed from another node other than the master, it was possible to start the next LCP before one or more nodes had completed the one ongoing; this caused problems with LCP_COMPLETE_REP signals from previous LCPs becoming mixed up with such signals from the current LCP, which in turn led to node failures.
To fix this problem, we now ensure that the previous LCP is complete before responding to any TCGETOPSIZEREQ signal initiating a new LCP. (Bug #87184, Bug #26524096)
NDB Cluster did not compile successfully when the build used WITH_UNIT_TESTS=OFF. (Bug #86881, Bug #26375985)
Recent improvements in local checkpoint handling that use OM_CREATE to open files did not work correctly on Windows platforms, where the system tried to create a new file and failed if it already existed. (Bug #86776, Bug #26321303)
A potential hundredfold signal fan-out when sending a START_FRAG_REQ signal could lead to a node failure due to a job buffer full error in start phase 5 while trying to perform a local checkpoint during a restart. (Bug #86675, Bug #26263397)
References: See also: Bug #26288247, Bug #26279522.
Compilation of NDB Cluster failed when using -DWITHOUT_SERVER=1 to build only the client libraries. (Bug #85524, Bug #25741111)
The NDBFS block's OM_SYNC flag is intended to make sure that all FSWRITEREQ signals used for a given file are synchronized, but was ignored by platforms that do not support O_SYNC, meaning that this feature did not behave properly on those platforms. Now the synchronization flag is used on those platforms that do not support O_SYNC. (Bug #76975, Bug #21049554)