33.6 Changes in MySQL NDB Cluster 8.0.33 (2023-04-18, General Availability)

Microsoft Windows: Two memory leaks found by code inspection were removed from NDB process handles on Windows platforms. (Bug #34872901)

Microsoft Windows: On Windows platforms, the data node angel process did not detect whether a child data node process exited normally. We fix this by keeping an open process handle to the child and using this when probing for the child's exit. (Bug #34853213)

NDB Cluster APIs; MySQL NDB ClusterJ: MySQL ClusterJ uses a scratch buffer for primary key hash calculations which was limited to 10000 bytes, which proved too small in some cases. Now we malloc() the buffer if its size is not sufficient.

This also fixes an issue with the Ndb object methods startTransaction() and computeHash() in the NDB API: Previously, if either of these methods was passed a temporary buffer of insufficient size, the method failed. Now in such cases a temporary buffer is allocated.

Our thanks to Mikael Ronström for this contribution. (Bug #103814, Bug #32959894)

NDB Cluster APIs: When dropping an event operation (NdbEventOperation) in the NDB API, it was sometimes possible for the dropped event operation to remain visible to the application after instructing the data nodes to stop sending events related to this event operation, but before all pending buffered events were consumed and discarded. This could be observed in certain cases when performing an online alter operation, such as ADD COLUMN or RENAME COLUMN, along with concurrent writes to the affected table.

Further analysis showed that the dropped events were accessible when iterating through event operations with Ndb::getGCIEventOperations(). Now, this method skips dropped events when called iteratively. (Bug #34809944)

NDB Cluster APIs: Event::getReport() always returned an error for an event opened from NDB, instead of returning the flags actually used by the report object. (Bug #34667384)

Before a new NDB table definition can be stored in the data dictionary, any existing definition must be removed. Table definitions have two unique values, the table name and the NDB Cluster se_private_id. During installation of a new table definition, we check whether there is any existing definition with the same table name and, if so, remove it. Then we check whether the table removed and the one being installed have the same se_private_id; if they do not, any definition that is occupying this se_private_id is considered stale, and removed as well.

Problems arose when no existing definition was found by the search using the table's name, since no definition was dropped even if one occupied se_private_id, leading to a duplicate key error when attempting to store the new table. The internal store_table() function attempted to clear the diagnostics area, remove the stale definition of se_private_id, and try to store it once again, but the diagnostics area was not actually cleared, thus leaking the error is thus leaked and presenting it to the user.

To fix this, we remove any stale table definition, regardless of any action taken (or not) by store_table(). (Bug #35089015)

Fixed the following two issues in the output of ndb_restore:

The backup file format version was shown for both the backup file format version and the version of the cluster which produced the backup.
To reduce confusion between the version of the file format and the version of the cluster which produced the backup, the backup file format version is now shown using hexadecimal notation.

(Bug #35079426)

References: This issue is a regression of: Bug #34110068.

Removed a memory leak in the DBDICT kernel block caused when an internal foreign key definition record was not released when no longer needed. This could be triggered by either of the following events:

Drop of a foreign key constraint on an NDB table
Rejection of an attempt to create a foreign key constraint on an NDB table

Such records use the DISK_RECORDS memory resource; you can check this on a running cluster by executing SELECT node_id, used FROM ndbinfo.resources WHERE resource_name='DISK_RECORDS' in the mysql client. This resource uses SharedGlobalMemory, exhaustion of which could lead not only to the rejection of attempts to create foreign keys, but of queries making use of joins as well, since the DBSPJ block also uses shared global memory by way of QUERY_MEMORY. (Bug #35064142)

When attempting a copying alter operation with --ndb-allow-copying-alter-table = OFF, the reason for rejection of the statement was not always made clear to the user. (Bug #35059079)

When a transaction coordinator is starting fragment scans with many fragments to scan, it may take a realtime break (RTB) during the process to ensure fair CPU access for other requests. When the requesting API disconnected and API failure handling for the scan state occurred before the RTB continuation returned, continuation processing could not proceed because the scan state had been removed.

We fix this by adding appropriate checks on the scan state as part of the continuation process. (Bug #35037683)

Sender and receiver signal IDs were printed in trace logs as signed values even though they are actually unsigned 32-bit numbers. This could result in confusion when the top bit was set, as it cuased such numbers to be shown as negatives, counting upwards from -MAX_32_BIT_SIGNED_INT. (Bug #35037396)

A fiber used by the DICT block monitors all indexes, and triggers index statistics calculations if requested by DBTUX index fragment monitoring; these calculations are performed using a schema transaction. When the DICT fiber attempts but fails to seize a transaction handle for requesting a schema transaction to be started, fiber exited, so that no more automated index statistics updates could be performed without a node failure. (Bug #34992370)

References: See also: Bug #34007422.

Schema objects in NDB use composite versioning, comprising major and minor subversions. When a schema object is first created, its major and minor versions are set; when an existing schema object is altered in place, its minor subversion is incremented.

At restart time each data node checks schema objects as part of recovery; for foreign key objects, the versions of referenced parent and child tables (and indexes, for foreign key references not to or from a table's primary key) are checked for consistency. The table version of this check compares only major subversions, allowing tables to evolve, but the index version also compares minor subversions; this resulted in a failure at restart time when an index had been altered.

We fix this by comparing only major subversions for indexes in such cases. (Bug #34976028)

References: See also: Bug #21363253.

ndb_import sometimes silently ignored hint failure for tables having large VARCHAR primary keys. For hinting which transaction coordinator to use, ndb_import can use the row's partitioning key, using a 4092 byte buffer to compute the hash for the key.

This was problematic when the key included a VARCHAR column using UTF8, since the hash buffer may require in bytes up to 24 times the number of maximum characters in the column, depending on the column's collation; the hash computation failed but the calling code in ndb_import did not check for this, and continued using an undefined hash value which yielded an undefined hint.

This did not lead to any functional problems, but was not optimal, and the user was not notified of it.

We fix this by ensuring that ndb_import always uses sufficient buffer for handling character columns (regardless of their collations) in the key, and adding a check in ndb_import for any failures in hash computation and reporting these to the user. (Bug #34917498)

When the ndbcluster plugin creates the ndb_schema table, the plugin inserts a row containing metadata, which is needed to keep track of this NDB Cluster instance, and which is stored as a set of key-value pairs in a row in this table.

The ndb_schema table is hidden from MySQL and so not possible to query using SQL, but contains a UUID generated by the same MySQL server that creates the ndb_schema table; the same UUID is also stored as metadata in the data dictionary of each MySQL Server when the ndb_schema table is installed on it.

When a mysqld connects (or reconnects) to NDB, it compares the UUID in its own data dictionary with the UUID stored in NDB in order to detect whether it is reconnecting to the same cluster; if not, the entire contents of the data dictionary are scrapped in order to make it faster and easier to install all tables fresh from NDB.

One such case occurs when all NDB data nodes have been restarted with --initial, thus removing all data and tables. Another happens when the ndb_schema table has been restored from a backup without restoring any of its data, since this means that the row for the ndb_schema table would be missing.

To deal with these types of situations, we now make sure that, when synchronization has completed, there is always a row in the NDB dictionary with a UUID matching the UUID stored in the MySQL server data dictionary. (Bug #34876468)

When running an NDB Cluster with multiple management servers, termination of the ndb_mgmd processes required an excessive amount of time when shutting down the cluster. (Bug #34872372)

Schema distribution timeout was detected by the schema distribution coordinator after dropping and re-creating the mysql.ndb_schema table when any nodes that were subscribed beforehand had not yet resubscribed when the next schema operation began. This was due to a stale list of subscribers being left behind in the schema distribution data; these subscribers were assumed by the coordinator to be participants in subsequent schema operations.

We fix this issue by clearing the list of known subscribers whenever the mysql.ndb_schema table is dropped. (Bug #34843412)

When requesting a new global checkpoint (GCP) from the data nodes, such as by the NDB Cluster handler in mysqld to speed up delivery of schema distribution events and responses, the request was sent 100 times. While the DBDIH block attempted to merge these duplicate requests into one, it was possible on occasion to trigger more than one immediate GCP. (Bug #34836471)

When the DBSPJ block receives a query for execution, it sets up its own internal plan for how to do so. This plan is based on the query plan provided by the optimizer, with adaptions made to provide the most efficient execution of the query, both in terms of elapsed time and of total resources used.

Query plans received by DBSPJ often contain star joins, in which several child tables depend on a common parent, as in the query shown here:

SELECT STRAIGHT_JOIN * FROM t AS t1
INNER JOIN t AS t2 ON t2.a = t1.k
INNER JOIN t AS t3 ON t3.k = t1.k;

In such cases DBSPJ could submit key-range lookups to t2 and t3 in parallel (but does not do so). An inner join also has the property that each inner joined row requires a match from the other tables in the same join nest, else the row is eliminated from the result set. Thus, by using the key-range lookups, we may retrieve rows from one such lookup which have no matches in the other, which effort is ultimately wasted. Instead, DBSPJ sets up a sequential plan for such a query.

It was found that this worked as intended for queries having only inner joins, but if any of the tables are left-joined, we did not take complete advantage of the preceding inner joined tables before issuing the outer joined tables. Suppose the previous query is modified to include a left join, like this:

SELECT STRAIGHT_JOIN * FROM t AS t1
INNER JOIN t AS t2 ON t2.a = t1.k
LEFT  JOIN t AS t3 ON t3.k = t1.k;

Using the following query against the ndbinfo.counters table, it is possible to observe how many rows are returned for each query before and after query execution:

SELECT counter_name, SUM(val) 
FROM ndbinfo.counters 
WHERE block_name="DBSPJ" AND counter_name = "SCAN_ROWS_RETURNED";

It was thus determined that requests on t2 and t3 were submitted in parallel. Now in such cases, we wait for the inner join to complete before issuing the left join, so that unmatched rows from t1 can be eliminated from the outer join on t1 and t3. This results in less work to be performed by the data nodes, and reduces the volumne handled by the transporter as well. (Bug #34782276)

SPJ handling of a sorted result was found to suffer a significant performance impact compared to the same result set when not sorted. Further investigation showed that most of the additional performance overhead for sorted results lay in the implementation for sorted result retrieval, which required an excessive number of SCAN_NEXTREQ round trips between the client and DBSPJ on the data nodes. (Bug #34768353)

DBSPJ now implements the firstMatch optimization for semijoins and antijoins, such as those found in EXISTS and NOT EXISTS subqueries. (Bug #34768191)

When the DBSPJ block sends SCAN_FRAGREQ and SCAN_NEXTREQ signals to the data nodes, it tries to determine the optimum number of fragments to scan in parallel without starting more parallel scans than needed to fill the available batch buffers, thus avoiding any need to send additional SCAN_NEXTREQ signals to complete the scan of each fragment.

The DBSPJ block's statistics module calculates and samples the parallelism which was optimal for fragment scans just completed, for each completed SCAN_FRAGREQ, providing a mean and standard deviation of the sampled parallelism. This makes it possible to calculate a lower 95th percentile of the parallelism (and batch size) which makes it possible to complete a SCAN_FRAGREQ without needing additional SCAN_NEXTREQ signals.

It was found that the parallelism statistics seemed unable to provide a stable parallelism estimate and that the standard deviation was unexpectedly high. This often led to the parallelism estimate being a negative number (always rounded up to 1).

The flaw in the statistics calculation was found to be an underlying assumption that each sampled SCAN_FRAGREQ contained the same number of key ranges to be scanned, which is not necessarily the case. Typically a full batch of rows for the first SCAN_FRAGREQ, and relatively few rows for the final SCAN_NEXTREQ returning the remaining rows; this resulted in wide variation in parallelism samples which made the statistics obtained from them unreliable.

We fix this by basing the statistics on the number of keys actually sent in the SCAN_FRAGREQ, and counting the rows returned from this request. Based on this it is possible to obtain record-per-key statistics to be calculated and sampled. This makes it possible to calculate the number of fragments which can be scanned, without overflowing the batch buffers. (Bug #34768106)

It was possible in certain cases that both the NDB binary logging thread and metadata synchronization attempted to synchronize the ndb_apply_status table, which led to a race condition. We fix this by making sure that the ndb_apply_status table is monitored and created (or re-created) by the binary logging thread only. (Bug #34750992)

While starting a schema operation, the client is responsible for detecting timeouts until the coordinator has received the first schema event; from that point, any schema operation timeout should be detected by the coordinator. A problem occurred while the client was checking the timeout; it mistakenly set the state indicating that timeout had occurred, which caused the coordinator to ignore the first schema event taking longer than approximately one second to receive (that is, to write the send event plus handle in the binary logging thread). This had the effect that, in these cases, the coordinator was not involved in the schema operation.

We fix this by change the schema distribution timeout checking to be atomic, and to let it be performed by either the client or the coordinator. In addition, we remove the state variable used for keeping track of events received by the coordinator, and rely on the list of participants instead. (Bug #34741743)

An SQL node did not start up correctly after restoring data with ndb_restore, such that, when it was otherwise ready to accept connections, the binary log injector thread never became ready. It was found that, when a mysqld was started after a data node initial restore from which new table IDs were generated, the utility table's (ndb_*) MySQL data dictionary definition might not match the NDB dictionary definition.

The existing mysqld definition is dropped by name, thus removing the unique ndbcluster-ID key in the MySQL data dictionary but the new table ID could also already be occupied by another (stale) definition. The resulting mistmatch prevented setup of the binary log.

To fix this problem we now explicitly drop any ndbcluster-ID definitions that might clash in such cases with the table being installed. (Bug #34733051)

After receiving a SIGTERM signal, ndb_mgmd did not wait for all threads to shut down before exiting. (Bug #33522783)

References: See also: Bug #32446105.

When multiple operations are pending on a single row, it is not possible to commit an operation which is run concurrently with an operation which is pending abort. This could lead to data node shutdown during the commit operation in DBACC, which could manifest when a single transaction contained more than MaxDMLOperationsPerTransaction DML operations.

In addition, a transaction containing insert operations is rolled back if a statement that uses a locking scan on the prepared insert fails due to too many DML operations. This could lead to an unplanned data node shutdown during tuple deallocation due to a missing reference to the expected DBLQH deallocation operation.

We solve this issue by allowing commit of a scan operation in such cases, in order to release locks previously acquired during the transaction. We also add a new special case for this scenario, so that the deallocation is performed in a single phase, and DBACC tells DBLQH to deallocate immediately; in DBLQH, execTUP_DEALLOCREQ() is now able to handle this immediate deallocation request. (Bug #32491105)

References: See also: Bug #28893633, Bug #32997832.

Cluster nodes sometimes reported Failed to convert connection to transporter warnings in logs, even when this was not really necessary. (Bug #14784707)

When started with no connection string on the command line, ndb_waiter printed Connecting to mgmsrv at (null). Now in such cases, it prints Connecting to management server at nodeid=0,localhost:1186 if no other default host is specified.

The --help option and other ndb_waiter program output was also improved. (Bug #12380163)

NdbSpin_Init() calculated the wrong number of loops in NdbSpin, and contained logic errors. (Bug #108448, Bug #32497174, Bug #32594825)

References: See also: Bug #31765660, Bug #32413458, Bug #102506, Bug #32478388.

33.6 Changes in MySQL NDB Cluster 8.0.33 (2023-04-18, General Availability)

Functionality Added or Changed

Bugs Fixed