4 Changes in MySQL NDB Cluster 8.4.4 (2025-01-22, LTS Release)

MySQL NDB Cluster 8.4.4 is a new LTS release of NDB 8.4, based on MySQL Server 8.4 and including features in version 8.4 of the NDB storage engine, as well as fixing recently discovered bugs in previous NDB Cluster releases.

Obtaining MySQL NDB Cluster 8.4. NDB Cluster 8.4 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.

For an overview of major changes made in NDB Cluster 8.4, see What is New in MySQL NDB Cluster 8.4.

This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 8.4 through MySQL 8.4.4 (see Changes in MySQL 8.4.4 (2025-01-21, LTS Release)).

Compilation Notes

macOS: A uint64_t value used with %zu caused a [-Wformat] compiler warning on MacOS. (Bug #37174692)
Removed a warning in storage/ndb/src/common/util/cstrbuf.cpp. (Bug #37049014)

Bugs Fixed

Microsoft Windows: Successive iterations of the sequence ndb_sign_keys --create-key followed by ndb_sign_keys --promote were unsuccessful on Windows. (Bug #36951132)
NDB Disk Data: mysqld did not use a disk scan for NDB tables with 256 disk columns or more. (Bug #37201922)
NDB Replication: The replication applier normally retries temporary errors occurring while applying transactions. Such retry logic is not performed for transactions containing row events where the STMT_END_F flag is missing; instead, the statement is committed in an additional step while applying the subsequent COMMIT query event when there are still locked tables. Problems arose when committing this statement, because temporary errors were not handled properly. Replica skip error functionality was also affected in that it attempted to skip only the error that occurred when a transaction was committed a second time.
The binary log contains an epoch transaction with writes from multiple server IDs on the source. The replica then uses IGNORE_SERVER_IDS (<last_server_id_in_binlog>) to cause the STMT_END_F to be filtered away, thus committing the statement from the COMMIT query log event on the applier. Holding a lock on one of the rows to be updated by the applier triggered error handling, which caused replication to stop with an error, with no retries being performed.
We now handle such errors, logging all messages in diagnostics areas (as is already done for row log events) and then retrying the transaction. (Bug #37331118)
NDB Replication: When a MySQL server performing binary logging connects to an NDB Cluster, it checks for existing binary logs; if it finds any, it writes an Incident event to a log file of its own so that any downstream replicas can detect the potential for lost events. Problems arose under some circumstances because it was possible for the timestamps of events logged in this file to be out of order; the Incident event was written following other events but had a smaller timestamp than these preceding events. We fix this issue by ensuring that a fresh timestamp is used prior to writing an incident to the binary log on startup rather than one which may have been obtained and held for some time previously. (Bug #37228735)
NDB Cluster APIs: The Ndb_cluster_connection destructor calls g_eventLogger::stopAsync() in order to release the buffers used by the asynchronous logging mechanism as well as to stop the threads responsible for this logging. When the g_eventLogger object was deleted before the Ndb_cluster_connection destructor was called, the application terminated after trying to use a method on a null object. This could happen in either of two ways:
- An API program deleted the logger object before deleting the Ndb_cluster_connection.
- ndb_end() was called before the Ndb_cluster_connection was deleted.
We solve this issue by skipping the call to stopAsync() in the Ndb_cluster_connection destructor when g_eventLogger is NULL. This fix also adds a warning to inform API users that deleting g_eventLogger before calling the Ndb_cluster_connection destructor is incorrect usage.
For more information, see API Initialization and Cleanup. (Bug #37300558)
NDB Cluster APIs: Removed known causes of API node sersus data node state misalignments, and improved the handling of state misalignments when detected. In one such case, separate handling of scan errors in the NDB kernel and those originating in API programs led to cleanup not being being performed after some scans. Handling of DBTC and API state alignment errors has been improved by this set of fixes, as well as scan protocol timeout handling in DBSPJ; now, when such misalignments in state are detected, the involved API nodes are disconnected rather than the data node detecting it being forced to shut down. (Bug #20430083, Bug #22782511, Bug #23528433, Bug #28505289, Bug #36273474, Bug #36395384, Bug #36838756, Bug #37022773, Bug #37022901, Bug #37023549)
References: See also: Bug #22782511, Bug #23528433, Bug #36273474, Bug #36395384, Bug #36838756.
ndbinfo Information Database: At table create and drop time, access of ndbinfo tables such as operations_per_fragment and memory_per_fragment sometimes examined data which was not valid.
To fix this, during scans of these ndbinfo tables, we ignore any fragments from tables in transient states at such times due to being created or dropped. (Bug #37140331)
Work done previously to support opening NDB tables with missing indexes was intended to allow the features of the MySQL server to be used to solve problems in cases where indexes cannot be rebuilt due to unmet constraints. With missing indexes, some of the SQL handler functionality is unavailable—for example, the use of indexes to select rows for modification efficiently, or to identify duplicates when processing modifications, or to push joins relying on indexes. This could lead to the unplanned shutdown of an NDB Cluster SQL node.
In such cases, the server now simply returns an error. (Bug #37299071)
Recent refactoring of the transporter layer added the reporting of the presence of socket shutdown errors, but not their nature. This led to confusion in the common case where a socket shutdown is requested, but the socket is already closed by the peer. To avoid such confusion, this logging has been removed. (Bug #37243135)
References: This issue is a regression of: Bug #35750771.

It was not possible to create an NDB table with 256 or more BLOB columns when also specifying a reduced inline size, as in the following SQL statement:

CREATE TABLE t1 (
   pk INT PRIMARY KEY,
   b1 BLOB COMMENT 'NDB_COLUMN=BLOB_INLINE_SIZE=100',
   b2 BLOB COMMENT 'NDB_COLUMN=BLOB_INLINE_SIZE=100',
   ...,
   b256 BLOB COMMENT 'NDB_COLUMN=BLOB_INLINE_SIZE=100'
) ENGINE=NDBCLUSTER;

(Bug #37201818)

In some cases, the occurrence of node failures during shutdown led to the cluster becoming unrecoverable without manual intervention.
We fix this by modifying global checkpoint ID (GCI) information propagation (CopyGCI mechanism) to reject propagation of any set of GCI information which does not describe the ability to recover the cluster automatically as part of a system restart. (Bug #37163647)
References: See also: Bug #37162636.
In some cases, node failures during an otherwise graceful shutdown could lead to a cluster becoming unrecoverable without manual intervention. This fix modifies the generic GCI info propagation mechanism (CopyGCI) to reject propagating any set of GCI information which does not describe the ability to recover a cluster automatically. (Bug #37162636)
Improved variable names used in start_resend(), and enhanced related debug messages to users and developers with additional information. (Bug #37157987)
In certain cases, a COPY_FRAGREQ signal did not honor a fragment scan lock. (Bug #37125935)
In cases where NDB experienced an API protocol timeout when attempting to close a scan operation, it considered the DBTC ApiConnectRecord involved to be lost for further use, at least until the API disconnected and API failure handling within DBTC reclaimed the record.
This has been improved by having the API send a TCRELEASEREQ signal to DBTC in such cases, performing API failure handling for a single ApiConnectRecord within DBTC. (Bug #37023661)
References: See also: Bug #36273474, Bug #36395384, Bug #37022773, Bug #37022901, Bug #37023549.
For tables using the NDB storage engine, the column comment option BLOB_INLINE_SIZE was silently ignored for TINYBLOB columns, and (silently) defaulted to the hard-coded 256 byte value regardless of the size provided; this was misleading to users.
To fix this problem, we now specifically disallow BLOB_INLINE_SIZE on TINYBLOB columns altogether, and NDB now prints a warning saying that the column size is defaulting to 256 bytes. (Bug #36725332)
Testing revealed that a fix for a previous issue which added a check of the ApiConnectRecord failure number against the system's current failure number did not initialize the ApiConnectRecord failure number in all cases. (Bug #36155195)
References: This issue is a regression of: Bug #36028828.
ndb_config did not always handle very long file paths correctly.
Our thanks to Dirkjan Bussink for the contribution. (Bug #116748, Bug #37310680)
Errors of unknown provenance were logged while assigning node IDs during cluster synchronization, leading to user doubt and concern. Logging of the data node QMGR block and the ndb_mgmd process relating to node ID allocation issues has therefore been improved, to supply more and better information about what is being reported in such cases. (Bug #116351, Bug #37189356)
A multi-range scan sometimes lost its fragment lock for the second and subsequent ranges of the scan. (Bug #111932, Bug #35660890)