33.15 Changes in MySQL NDB Cluster 8.0.23 (2021-01-18, General Availability)

Deprecation and Removal Notes

Important Change: As part of the terminology changes begun in MySQL 8.0.21 and NDB 8.0.21, the ndb_slave_conflict_role system variable is now deprecated, and is being replaced with ndb_conflict_role.

In addition, a number of status variables have been deprecated and are being replaced, as shown in the following table:

Table 2 Deprecated NDB status variables and their replacements

Deprecated variable	Replacement
`Ndb_api_adaptive_send_deferred_count_slave`	`Ndb_api_adaptive_send_deferred_count_replica`
`Ndb_api_adaptive_send_forced_count_slave`	`Ndb_api_adaptive_send_forced_count_replica`
`Ndb_api_adaptive_send_unforced_count_slave`	`Ndb_api_adaptive_send_unforced_count_replica`
`Ndb_api_bytes_received_count_slave`	`Ndb_api_bytes_received_count_replica`
`Ndb_api_bytes_sent_count_slave`	`Ndb_api_bytes_sent_count_replica`
`Ndb_api_pk_op_count_slave`	`Ndb_api_pk_op_count_replica`
`Ndb_api_pruned_scan_count_slave`	`Ndb_api_pruned_scan_count_replica`
`Ndb_api_range_scan_count_slave`	`Ndb_api_range_scan_count_replica`
`Ndb_api_read_row_count_slave`	`Ndb_api_read_row_count_replica`
`Ndb_api_scan_batch_count_slave`	`Ndb_api_scan_batch_count_replica`
`Ndb_api_table_scan_count_slave`	`Ndb_api_table_scan_count_replica`
`Ndb_api_trans_abort_count_slave`	`Ndb_api_trans_abort_count_replica`
`Ndb_api_trans_close_count_slave`	`Ndb_api_trans_close_count_replica`
`Ndb_api_trans_commit_count_slave`	`Ndb_api_trans_commit_count_replica`
`Ndb_api_trans_local_read_row_count_slave`	`Ndb_api_trans_local_read_row_count_replica`
`Ndb_api_trans_start_count_slave`	`Ndb_api_trans_start_count_replica`
`Ndb_api_uk_op_count_slave`	`Ndb_api_uk_op_count_replica`
`Ndb_api_wait_exec_complete_count_slave`	`Ndb_api_wait_exec_complete_count_replica`
`Ndb_api_wait_meta_request_count_slave`	`Ndb_api_wait_meta_request_count_replica`
`Ndb_api_wait_nanos_count_slave`	`Ndb_api_wait_nanos_count_replica`
`Ndb_api_wait_scan_result_count_slave`	`Ndb_api_wait_scan_result_count_replica`
`Ndb_slave_max_replicated_epoch`	`Ndb_replica_max_replicated_epoch`

Also as part of this work, the ndbinfo.table_distribution_status table's tab_copy_status column values ADD_TABLE_MASTER and ADD_TABLE_SLAVE are deprecated, and replaced by, respectively, ADD_TABLE_COORDINATOR and ADD_TABLE_PARTICIPANT.

Finally, the --help output of some NDB utility programs such as ndb_restore has been updated. (Bug #31571031)

NDB Client Programs: Effective with this release, the MySQL NDB Cluster Auto-Installer (ndb_setup.py) has been has been removed from the NDB Cluster binary and source distributions, and is no longer supported. (Bug #32084831)
References: See also: Bug #31888835.
ndbmemcache: ndbmemcache, which was deprecated in the previous release of NDB Cluster, has now been removed from NDB Cluster, and is no longer supported. (Bug #32106576)

Functionality Added or Changed

As part of work previously done in NDB 8.0, the metadata check performed as part of auto-synchronization between the representation of an NDB table in the NDB dictionary and its counterpart in the MySQL data dictionary has been extended to include, in addition to table-level properties, the properties of columns, indexes, and foreign keys. (This check is also made by a debug MySQL server when performing a CREATE TABLE statement, and when opening an NDB table.)
As part of this work, any mismatches found between an object's properties in the NDB dictionary and the MySQL data dictionary are now written to the MySQL error log. The error log message includes the name of the property, its value in the NDB dictionary, and its value in the MySQL data dictionary. If the object is a column, index, or foreign key, the object's type is also indicated in the message. (WL #13412)
The ThreadConfig parameter has been extended with two new thread types, query threads and recovery threads, intended for scaleout of LDM threads. The number of query threads must be a multiple of the number of LDM threads, up to a maximum of 3 times the number of LDM threads.
It is also now possible when setting ThreadConfig to combine the main and rep threads into a single thread by setting either or both of these arguments to 0.
When one of these arguments is set to 0 but the other remains set to 1, the resulting combined thread is named main_rep. When both are set to 0, they are combined with the recv thread (assuming that recv to 1), and this combined thread is named main_rep_recv. These thread names are those shown when checking the threads table in the ndbinfo information database.
In addition, the maximums for a number of existing thread types have been increased. The new maximums are: LDM threads: 332; TC threads: 128; receive threads: 64; send threads: 64; main threads: 2. (The maximums for query threads and recovery threads are 332 each.) Maximums for other thread types remain unchanged from previous NDB Cluster releases.
Another change related to this work causes NDB to employ mutexes for protecting job buffers when more than 32 block threads are in use. This may cause a slight decrease in performance (roughly 1 to 2 percent), but also results in a decrease in the amount of memory used by very large configurations. For example, a setup with 64 threads which used 2 GB of job buffer memory previously should now require only about 1 GB instead. In our testing this has resulted in an overall improvement (on the order of 5 percent) in the execution of very complex queries.
For more information, see the descriptions of the arguments to the ThreadConfig parameter discussed previously, and of the ndbinfo.threads table. (WL #12532, WL #13219, WL #13338)
This release adds the possibility of configuring the threads for multithreaded data nodes (ndbmtd) automatically by implementing a new data node configuration parameter AutomaticThreadConfig. When set to 1, NDB sets up the thread assignments automatically, based on the number of processors available to applications. If the system does not limit the number of processors, you can do this by setting NumCPUs to the desired number. Automatic thread configuration makes it unnecessary to set any values for ThreadConfig or MaxNoOfExecutionThreads in config.ini; if AutomaticThreadConfig is enabled, settings for either of these parameters are not used.
As part of this work, a set of tables providing information about hardware and CPU availability and usage by NDB data nodes have been added to the ndbinfo information database. These tables, along with a brief description of the information provided by each, are listed here:
- cpudata: CPU usage during the past second
- cpudata_1sec: CPU usage per second over the past 20 seconds
- cpudata_20sec: CPU usage per 20-second interval over the past 400 seconds
- cpudata_50ms: CPU usage per 50-millisecond interval during the past second
- cpuinfo: The CPU on which the data node executes
- hwinfo: The hardware on the host where the data node resides
Not all of the tables listed are available on all platforms supported by NDB Cluster:
- The cpudata, cpudata_1sec, cpudata_20sec, and cpudata_50ms tables are available only on Linux and Solaris operating systems.
- The cpuinfo table is not available on FreeBSD or macOS.
(WL #13980)
Added statistical information in the DBLQH block which is employed to track the use of key lookups and scans, as well as tracking queries from DBTC and DBSPJ. By detecting situations in which the load is high, but in which there is not actually any need to decrease the number of rows scanned per realtime break, rather than checking the size of job buffer queues to decide how many rows to scan, this makes it possible to scan more rows when there is no CPU congestion. This helps improve performance and realtime behaviour when handling high loads. (WL #14081)
A new method for handling table partitions and fragments is introduced, such that the number of local data managers (LDMs) for a given data node can determined independently of the number of redo log parts, and that the number of LDMs can now be highly variable. NDB employs this method when the ClassicFragmentation data node configuration parameter, implemented as part of this work, is set to false. When this is done, the number of LDMs is no longer used to determine how many partitions to create for a table per data node; instead, the PartitionsPerNode parameter, also introduced in this release, now determines this number, which is now used for calculating how many fragments a table should have.
When ClassicFragmentation has its default value true, then the traditional method of using the number of LDMs is used to determine how many fragments a table should have.
For more information, see Multi-Threading Configuration Parameters (ndbmtd). (WL #13930, WL #14107)

Bugs Fixed

macOS: Removed a number of compiler warnings which occurred when building NDB for Mac OS X. (Bug #31726693)
Microsoft Windows: Removed a compiler warning C4146: unary minus operator applied to unsigned type, result still unsigned from Visual Studio 2013 found in storage\ndb\src\kernel\blocks\dbacc\dbaccmain.cpp. (Bug #23130016)
Solaris: Due to a source-level error, atomic_swap_32() was supposed to be specified but was not actually used for Solaris builds of NDB Cluster. (Bug #31765608)
NDB Cluster APIs: Removed redundant usage of strlen() in the implementation of NdbDictionary and related internal classes in the NDB API. (Bug #100936, Bug #31930362)
MySQL NDB ClusterJ: When a DomainTypeHandler was instantiated by a SessionFactory, it was stored locally in a static map, typeToHandlerMap. If multiple, distinct SessionFactories for separate connections to the data nodes were obtained by a ClusterJ application, the static typeToHandlerMap would be shared by all those factories. When one of the SessionFactories was closed, the connections it created were closed and any tables opened by the connections were cleared from the NDB API global cache. However, the typeToHandlerMap was not cleared, and through it the other SessionFactories keep accessing the DomainTypeHandlers of tables that had already been cleared. These obsolete DomainTypeHandlers contained invalid NdbTable references and any ndbapi calls using those table references ended up with errors.
This patch fixes the issue by making the typeToHandlerMap and the related proxyInterfacesToDomainClassMap maps local to a SessionFactory, so that they are cleared when the SessionFactory is closed. (Bug #31710047)
MySQL NDB ClusterJ: Setting com.mysql.clusterj.connection.pool.size=0 made connections to an NDB Cluster fail. With this fix, setting com.mysql.clusterj.connection.pool.size=0 disables connection pooling as expected, so that every request for a SessionFactory results in the creation of a new factory and separate connections to the cluster can be created using the same connection string. (Bug #21370745, Bug #31721416)
When calling disk_page_abort_prealloc(), the callback from this internal function is ignored, and so removal of the operation record for the LQHKEYREQ signal proceeds without waiting. This left the table subject to removal before the callback had completed, leading to a failure in PGMAN when the page was retrieved from disk.
To avoid this, we add an extra usage count for the table especially for this page cache miss; this count is decremented as soon as the page cache miss returns. This means that we guarantee that the table is still present when returning from the disk read. (Bug #32146931)
When a table was created, it was possible for a fragment of the table to be checkpointed too early during the next local checkpoint. This meant that Prepare Phase LCP writes were still being performed when the LCP completed, which could lead to problems with subsequent ALTER TABLE statements on the table just created. Now we wait for any potential Prepare Phase LCP writes to finish before the LCP is considered complete. (Bug #32130918)
Using the maximum size of an index key supported by index statistics (3056 bytes) caused buffer issues in data nodes. (Bug #32094904)
References: See also: Bug #25038373.
NDB now prefers CLOCK_MONOTONIC which on Linux is adjusted by frequency changes but is not updated during suspend. On macOS, NDB instead uses CLOCK_UPTIME_RAW which is the same, except that it is not affected by any adjustments.
In addition, when intializing NdbCondition the monotonic clock to use is taken directly from NdbTick, rather than re-executing the same preprocessor logic used by NdbTick. (Bug #32073826)
ndb_restore terminated unexpectedly when run with the --decrypt option on big-endian systems. (Bug #32068854)
When the data node receive thread found that the job buffer was too full to receive, nothing was done to ensure that, the next time it checked, it resumed receiving from the transporter at the same point at which it stopped previously. (Bug #32046097)
The metadata check failed during auto-synchronization of tables restored using the ndb_restore tool. This was a timing issue relating to indexes, and was found in the following two scenarios encountered when a table had been selected for auto-synchronization:
1. When the indexes had not yet been created in the NDB dictionary
2. When the indexes had been created, but were not yet usable
(Bug #32004637)
Optimized sending of packed signals by registering the kernel blocks affected and the sending functions which need to be called for each one in a data structure rather than looking up this information each time. (Bug #31936941)
When two data definition language statements—one on a database and another on a table in the same schema—were run in parallel, it was possible for a deadlock to occur. The DDL statement affecting the database acquired the global schema lock first, but before it could acquire a metadata lock on the database, the statement affecting the table acquired an intention-exclusive metadata lock on the schema. The table DDL statement was thus waiting for the global schema lock to upgrade its metadata lock on the table to an exclusive lock, while the database DDL statement waited for an exclusive metadata lock on the database, leading to a deadlock.
A similar type of deadlock involving tablespaces and tables was already known to occur; NDB already detected and resolved that issue. The current fix extends that logic to handle databases and tables as well, to resolve the problem. (Bug #31875229)
Clang 8 raised a warning due to an uninitialized variable. (Bug #31864792)
An empty page acquired for an insert did not receive a log sequence number. This is necessary in case the page was used previously and thus required undo log execution before being used again. (Bug #31859717)
No reason was provided when rejecting an attempt to perform an in-place ALTER TABLE ... ADD PARTITION statement on a fully replicated table. (Bug #31809290)
When the master node had recorded a more recent GCI than a node starting up which had performed an unsuccessful restart, subsequent restarts of the latter could not be performed because it could not restore the stated GCI. (Bug #31804713)
When using 3 or 4 fragment replicas, it is possible to add more than one node at a time, which means that DBLQH and DBDIH can have distribution keys based on numbers of fragment replicas that differ by up to 3 (that is, MAX_REPLICAS - 1), rather than by only 1. (Bug #31784934)
It was possible in DBLQH for an ABORT signal to arrive from DBTC before it received an LQHKEYREF signal from the next local query handler. Now in such cases, the out-of-order ABORT signal is ignored. (Bug #31782578)
NDB did not handle correctly the case when an ALTER TABLE ... COMMENT="..." statement did not specify ALGORITHM=COPY. (Bug #31776392)
It was possible in some cases to miss the end point of undo logging for a fragment. (Bug #31774459)
ndb_print_sys_file did not work correctly with version 2 of the sysfile format that was introduced in NDB 8.0.18. (Bug #31726653)
References: See also: Bug #31828452.
DBLQH could not handle the case in which identical operation records having the same transaction ID came from different transaction coordinators. This led to locked rows persisting after a node failure, which kept node recovery from completing. (Bug #31726568)
It is possible for DBDIH to receive a local checkpoint having a given ID to restore while a later LCP is actually used instead, but when performing a partial LCP in such cases, the DIH block was not fully synchronized with the ID of the LCP used. (Bug #31726514)
In most cases, when searching a hash index, the row is used to read the primary key, but when the row has not yet been committed the primary key may be read from the copy row. If the row has been deleted, it can no longer be used to read the primary key. Previously in such cases, the primary key was treated as a NULL, but this could lead to making a comparison using uninitialised data.
Now when this occurs, the comparison is made only if the row has not been deleted; otherwise the row is checked of among the operations in the serial queue. If no operation has the primary key, then any comparison can be reported as not equal, since no entry in the parallel queue can reinsert the row. This needs to be checked due to the fact that, if an entry in the serial queue is an insert then the primary key from this operation must be identified as such to preclude inserting the same primary key twice. (Bug #31688797)
As with writing redo log records, when the file currently used for writing global checkpoint records becomes full, writing switches to the next file. This switch is not supposed to occur until the new file is actually ready to receive the records, but no check was made to ensure that this was the case. This could lead to an unplanned data node shutdown restoring data from a backup using ndb_restore. (Bug #31585833)
Release of shared global memory when it is no longer required by the DBSPJ block now occurs more quickly than previously. (Bug #31321518)
References: See also: Bug #31231286.
Stopping 3 nodes out of 4 in a single node group using kill -9 caused an unplanned cluster shutdown. To keep this from happening under such conditions, NDB now ensures that any node group that has not had any node failures is viewed by arbitration checks as fully viable. (Bug #31245543)
Multi-threaded index builds could sometimes attempt to use an internal function disallowed to them. (Bug #30587462)
While adding new data nodes to the cluster, and while the management node was restarting with an updated configuration file, some data nodes terminated unexpectedly with the error virtual void TCP_Transporter::resetBuffers(): Assertion `!isConnected()' failed. (Bug #30088051)
It was not possible to execute TRUNCATE TABLE or DROP TABLE for the parent table of a foreign key with foreign_key_checks set to 0. (Bug #97501, Bug #30509759)
Optimized the internal NdbReceiver::unpackNdbRecord() method, which is used to convert rows retrieved from the data nodes from packed wire format to the NDB API row format. Prior to the change, roughly 13% of CPU usage for executing a join occurred within this method; this was reduced to approximately 8%. (Bug #95007, Bug #29640755)