37.30 Changes in MySQL NDB Cluster 7.6.1 (5.7.17-ndb-7.6.1) (Not released, Development Milestone 1)

Functionality Added or Changed

NDB Disk Data: A new file format is introduced in this release for NDB Disk Data tables. The new format provides a mechanism whereby each Disk Data table can be uniquely identified without reusing table IDs. This is intended to help resolve issues with page and extent handling that were visible to the user as problems with rapid creating and dropping of Disk Data tables, and for which the old format did not provide a ready means to fix.
The new format is now used whenever new undo log file groups and tablespace data files are created. Files relating to existing Disk Data tables continue to use the old format until their tablespaces and undo log file groups are re-created. Important: The old and new formats are not compatible and so cannot be employed for different data files or undo log files that are used by the same Disk Data table or tablespace.
To avoid problems relating to the old format, you should re-create any existing tablespaces and undo log file groups when upgrading. You can do this by performing an initial restart of each data node (that is, using the --initial option) as part of the upgrade process. Since the current release is a pre-GA Developer release, this initial node restart is optional for now, but you should expect it—and prepare for it now—to be mandatory in GA versions of NDB 7.6.
If you are using Disk Data tables, a downgrade from any NDB 7.6 release to any NDB 7.5 or earlier release requires restarting data nodes with --initial as part of the downgrade process, due to the fact that NDB 7.5 and earlier releases cannot read the new Disk Data file format.
For more information, see Upgrading and Downgrading NDB Cluster. (WL #9778)

Bugs Fixed

Packaging: NDB Cluster Auto-Installer RPM packages for SLES 12 failed due to a dependency on python2-crypto instead of python-pycrypto. (Bug #25399608)
NDB Disk Data: Stale data from NDB Disk Data tables that had been dropped could potentially be included in backups due to the fact that disk scans were enabled for these. To prevent this possibility, disk scans are now disabled—as are other types of scans—when taking a backup. (Bug #84422, Bug #25353234)
NDB Cluster APIs: When signals were sent while the client process was receiving signals such as SUB_GCP_COMPLETE_ACK and TC_COMMIT_ACK, these signals were temporary buffered in the send buffers of the clients which sent them. If not explicitly flushed, the signals remained in these buffers until the client woke up again and flushed its buffers. Because there was no attempt made to enforce an upper limit on how long the signal could remain unsent in the local client buffers, this could lead to timeouts and other misbehavior in the components waiting for these signals.
In addition, the fix for a previous, related issue likely made this situation worse by removing client wakeups during which the client send buffers could have been flushed.
The current fix moves responsibility for flushing messages sent by the receivers, to the receiver (poll_owner client). This means that it is no longer necessary to wake up all clients merely to have them flush their buffers. Instead, the poll_owner client (which is already running) performs flushing the send buffer of whatever was sent while delivering signals to the recipients. (Bug #22705935)
References: See also: Bug #18753341, Bug #23202735.
NDB Cluster APIs: The adaptive send algorithm was not used as expected, resulting in every execution request being sent to the NDB kernel immediately, instead of trying first to collect multiple requests into larger blocks before sending them. This incurred a performance penalty on the order of 10%. The issue was due to the transporter layer always handling the forceSend argument used in several API methods (including nextResult() and close()) as true. (Bug #82738, Bug #24526123)
The ndb_print_backup_file utility failed when attempting to read from a backup file when the backup included a table having more than 500 columns. (Bug #25302901)
References: See also: Bug #25182956.
ndb_restore did not restore tables having more than 341 columns correctly. This was due to the fact that the buffer used to hold table metadata read from .ctl files was of insufficient size, so that only part of the table descriptor could be read from it in such cases. This issue is fixed by increasing the size of the buffer used by ndb_restore for file reads. (Bug #25182956)
References: See also: Bug #25302901.
No traces were written when ndbmtd received a signal in any thread other than the main thread, due to the fact that all signals were blocked for other threads. This issue is fixed by the removal of SIGBUS, SIGFPE, SIGILL, and SIGSEGV signals from the list of signals being blocked. (Bug #25103068)
The ndb_show_tables utility did not display type information for hash maps or fully replicated triggers. (Bug #24383742)
The NDB Cluster Auto-Installer did not show the user how to force an exit from the application (CTRL+C). (Bug #84235, Bug #25268310)
The NDB Cluster Auto-Installer failed to exit when it was unable to start the associated service. (Bug #84234, Bug #25268278)
The NDB Cluster Auto-Installer failed when the port specified by the --port option (or the default port 8081) was already in use. Now in such cases, when the required port is not available, the next 20 ports are tested in sequence, with the first one available being used; only if all of these are in use does the Auto-Installer fail. (Bug #84233, Bug #25268221)
Multiples instances of the NDB Cluster Auto-Installer were not detected. This could lead to inadvertent multiple deployments on the same hosts, stray processes, and similar issues. This issue is fixed by having the Auto-Installer create a PID file (mcc.pid), which is removed upon a successful exit. (Bug #84232, Bug #25268121)
When a data node running with StopOnError set to 0 underwent an unplanned shutdown, the automatic restart performed the same type of start as the previous one. In the case where the data node had previously been started with the --initial option, this meant that an initial start was performed, which in cases of multiple data node failures could lead to loss of data. This issue also occurred whenever a data node shutdown led to generation of a core dump. A check is now performed to catch all such cases, and to perform a normal restart instead.
In addition, in cases where a failed data node was unable prior to shutting down to send start phase information to the angel process, the shutdown was always treated as a startup failure, also leading to an initial restart. This issue is fixed by adding a check to execute startup failure handling only if a valid start phase was received from the client. (Bug #83510, Bug #24945638)
Data nodes that were shut down when the redo log was exhausted did not automatically trigger a local checkpoint when restarted, and required the use of DUMP 7099 to start one manually. (Bug #82469, Bug #24412033)
When a data node was restarted, the node was first stopped, and then, after a fixed wait, the management server assumed that the node had entered the NOT_STARTED state, at which point, the node was sent a start signal. If the node was not ready because it had not yet completed stopping (and was therefore not actually in NOT_STARTED), the signal was silently ignored.
To fix this issue, the management server now checks to see whether the data node has in fact reached the NOT_STARTED state before sending the start signal. The wait for the node to reach this state is split into two separate checks:
- Wait for data nodes to start shutting down (maximum 12 seconds)
- Wait for data nodes to complete shutting down and reach NOT_STARTED state (maximum 120 seconds)
If either of these cases times out, the restart is considered failed, and an appropriate error is returned. (Bug #49464, Bug #11757421)
References: See also: Bug #28728485.