33.14 Changes in MySQL NDB Cluster 8.0.24 (2021-04-20, General Availability)

Functionality Added or Changed

NDB Cluster APIs: The version of Node.js used by NDB has been upgraded to 12.20.1. (Bug #32356419)
ndbinfo Information Database: Added the dict_obj_tree table to the ndbinfo information database. This table provides information about NDB database objects similar to what is shown by the dict_obj_info table, but presents it in a hierarchical or tree-like fashion that simplifies seeing relationships between objects such as: tables and indexes; tablespaces and data files; log file groups and undo log files.
An example of such a view of a table t1, having a primary key on column a and a unique key on column b, is shown here:
```
mysql> SELECT indented_name FROM ndbinfo.dict_obj_tree
    -> WHERE root_name = 'test/def/t1';
+----------------------------+
| indented_name              |
+----------------------------+
| test/def/t1                |
|   -> sys/def/13/b          |
|     -> NDB$INDEX_15_CUSTOM |
|   -> sys/def/13/b$unique   |
|     -> NDB$INDEX_16_UI     |
|   -> sys/def/13/PRIMARY    |
|     -> NDB$INDEX_14_CUSTOM |
+----------------------------+
7 rows in set (0.15 sec)
```
For additional information and examples, see The ndbinfo dict_obj_tree Table. (Bug #32198754)
ndbinfo Information Database: Added the backup_id table to the ndbinfo information database. This table contains a single column (id) and a single row, in which the column value is the backup ID of the most recent backup of the cluster taken with the ndb_mgm client. If no NDB backups can be found, the value is 0.
Selecting from this table replaces the process of obtaining this information by using the ndb_select_all utility to dump the contents of the internal SYSTAB_0 table, which is error-prone and can require an excessively long time to complete. (Bug #32073640)
Added the status variable Ndb_config_generation, which shows the generation number of the current configuration being used by the cluster. This can be used as an indicator to determine whether the configuration of the cluster has changed. (Bug #32247424)
NDB Cluster now uses the MySQL host_application_signal component service to perform shutdown of SQL nodes. (Bug #30535835, Bug #32004109)
NDB has implemented the following two improvements in calculation of index statistics:
- Previously, index statistics were collected from a single fragment only; this is changed such that additional fragments are used for these.
- The algorithm used for very small tables, such as those having very few rows where results are discarded, has been improved, so that estimates for such tables should be more accurate than previously.
See NDB API Statistics Counters and Variables, for more information. (WL #13144)
A number of NDB Cluster programs now support input of the password for encrypting or decrypting an NDB backup from standard input. Changes relating to each program affected are listed here:
- For ndb_restore, the --backup-password-from-stdin option introduced in this release enables input of the password in a secure fashion, similar to how it is done by the mysql client' --password option. Use this option together with the --decrypt option.
- ndb_print_backup_file now also supports --backup-password-from-stdin as the long form of the existing -P option.
- For ndb_mgm, --backup-password-from-stdin is supported together with --execute "START BACKUP [options]" for starting an encrypted cluster backup from the system shell, and has the same effect.
- Two options for ndbxfrm, --encrypt-password-from-stdin and --decrypt-password-from-stdin, which are also introduced in this release, cause similar behavior when using this program, respectively, to encrypt or to decrypt a backup file.
In addition, you can cause ndb_mgm to use encryption whenever it creates a backup by starting it with --encrypt-backup. In this case, the user is prompted for a password when invoking START BACKUP if none is supplied. This option can also be specified in the [ndb_mgm] section of the my.cnf file.
Also, the behavior and syntax of the ndb_mgm management client START BACKUP are changed slightly, such that it is now possible to use the ENCRYPT option without also specifying PASSWORD. Now when the user does this, the management client prompts the user for a password.
For more information, see the descriptions of the NDB Cluster programs and program options just mentioned, as well as Online Backup of NDB Cluster. (WL #14259)

Bugs Fixed

Packaging: The mysql-cluster-community-server-debug and mysql-cluster-commercial-server-debug RPM packages were dependent on mysql-community-server and mysql-commercial-server, respectively, instead of mysql-cluster-community-server and mysql-cluster-commercial-server. (Bug #32683923)
Packaging: RPM upgrades from NDB 7.6.15 to 8.0.22 did not succeed due to a file having been moved from the server RPM to the client-plugins RPM. (Bug #32208337)
Linux: On Linux systems, NDB interpreted memory sizes obtained from /proc/meminfo as being supplied in bytes rather than kilobytes. (Bug #102505, Bug #32474829)
Microsoft Windows: Removed several warnings which were generated when building NDB Cluster on Windows using Microsoft Visual Studio 2019. (Bug #32107056)
Microsoft Windows: NDB failed to start correctly on Windows when initializing the NDB library with ndb_init(), with the error Failed to find CPU in CPU group.
This issue was due to how Windows works with regard to assigning processes to CPUs: when there are more than 64 logical CPUs on a machine, Windows divides them into different processor groups during boot. Each processor group can at most hold 64 CPUs; by default, a process can be assigned to only one processor group. The function std::thread::hardware_concurrency() was used to get the maximum number of logical CPUs on the machine, but on Windows, this function returns only the maximum number of logical CPUs present in the processor group with which the current process is affiliated. This value is used to allocate memory for an array that holds hardware information about each CPU on the machine. Since the array held valid memory for CPUs from only one processor group, any attempt to store and retrieve hardware information about a CPU in a different processor group led to array bound read/write errors, leading to memory corruption and ultimately leads to process failures.
Fixed by using GetActiveProcessorCount() instead of the hardware_concurrency() function referenced previously. (Bug #101347, Bug #32074703)
Solaris: While preparing NDBFS for handling of encrypted backups, activation of O_DIRECT was suspended until after initialization of files was completed. This caused initialization of redo log files to require an excessive amount of time on systems using hard disk drives with ext3 file systems.
On Solaris, directio is used instead of O_DIRECT; activating directio prior to initialization of files caused a notable increase in time required when using hard disk drives with UFS file systems.
Now we ensure that, on systems having O_DIRECT, this is activated before initialization of files, and that, on Solaris, directio continues to be activated after initialization of files. (Bug #32187942)
NDB Cluster APIs: Several NDB API coding examples included in the source did not release all resources allocated. (Bug #31987735)
NDB Cluster APIs: Some internal dictionary objects in NDB used an internal name format which depends on the database name of the Ndb object. This dependency has been made more explicit where necessary and otherwise removed.
Users of the NDB API should be aware that the fullyQualified argument to Dictionary::listObjects() still works in such a way that specifying it as false causes the objects in the list it returns to use fully qualified names. (Bug #31924949)
ndbinfo Information Database: The system variables ndbinfo_database and ndbinfo_table_prefix are intended to be read-only. It was found that it was possible to set mysqld command-line options corresponding to either or both of these; doing so caused the ndbinfo database to malfunction. This fix insures that it is no longer possible to set either of these variables in the mysql client or from the command line. (Bug #23583256)
In some cases, a query affecting a user with the NDB_STORED_USER privilege could be printed to the MySQL server log without being rewritten. Now such queries are omitted or rewritten to remove any text following the keyword IDENTIFIED. (Bug #32541096)
The value set for the SpinMethod data node configuration parameter was ignored. (Bug #32478388)
The compile-time debug flag DEBUG_FRAGMENT_LOCK was enabled by default. This caused increased resource usage by DBLQH, even for release builds.
This is fixed by disabling DEBUG_FRAGMENT_LOCK by default. (Bug #32459625)
ndb_mgmd now exits gracefully in the event of a SIGTERM just as it does following a management client SHUTDOWN command. (Bug #32446105)
When started on a port which was already in use, ndb_mgmd did not throw any errors since the use of SO_REUSEADDR on Windows platforms allowed multiple sockets to bind to the same address and port.
To take care of this issue, we replace SO_REUSEADDRPORT with SO_EXCLUSIVEADDRUSE, which prevents re-use of a port that is already in use. (Bug #32433002)
Encountering an error in detection of an initial system restart of the cluster caused the SQL node to exit prematurely. (Bug #32424580)
Under some situations, when trying to measure the time of a CPU pause, an elapsed time of zero could result. In addition, computing the average for a very fast spin (for example, 100 loops taking less than 100ns) could zero nanoseconds. In both cases, this caused the spin calibration algorithm throw an arithmetic exception due to division by zero.
We fix both issues by modifying the algorithm so that it ignores zero values when computing mean spin time. (Bug #32413458)
References: See also: Bug #32497174.
Table and database names were not formatted correctly in the messages written to the mysqld error log when the internal method Ndb_rep_tab_reader::scan_candidates() found ambiguous matches for a given database, table, or server ID in the ndb_replication table. (Bug #32393245)
Some queries with nested pushed joins were not processed correctly. (Bug #32354817)
When ndb_mgmd allocates a node ID, it reads through the configuration to find a suitable ID, causing a mutex to be held while performing hostname lookups. Because network address resolution can require large amounts of time, it is not considered good practice to hold such a mutex or lock while performing network operations.
This issue is fixed by building a list of configured nodes while holding the mutex, then using the list to perform hostname matching and other logic. (Bug #32294679)
The schema distribution participant failed to start a global checkpoint after writing a reply to the ndb_schema_result table, which caused an unnecessary delay before the coordinator received events from the participant notifying it of the result. (Bug #32284873)
The global DNS cache used in ndb_mgmd caused stale lookups when restarting a node on a new machine with a new IP address, which meant that the node could not allocate a node ID.
This issue is addressed by the following changes:
- Node ID allocation no longer depends on LocalDnsCache
- DnsCache now uses local scope only
(Bug #32264914)
ndb_restore generated a core file when started with unknown or invalid arguments. (Bug #32257374)
Auto-synchronization detected the presence of mock foreign key tables in the NDB dictionary and attempted to re-create them in the MySQL server's data dictionary, although these should remain internal to the NDB Dictionary and not be exposed to the MySQL server. To fix this issue, we now ensure that the NDB Cluster auto-synchronization mechanism ignores any such mock tables. (Bug #32245636)
Improved resource usage associated with handling of cluster configuration data. (Bug #32224672)
Removed left-over debugging printouts from ndb_mgmd showing a client's version number upon connection. (Bug #32210216)
References: This issue is a regression of: Bug #30599413.
The backup abort protocol for handling of node failures did not function correctly for single-threaded data nodes (ndbd). (Bug #32207193)
While retrieving sorted results from a pushed-down join using ORDER BY with the index access method (and without filesort), an SQL node sometimes unexpectedly terminated. (Bug #32203548)
Logging of redo log initialization showed log part indexes rather than log part numbers. (Bug #32200635)
Signal data was overwritten (and lost) due to use of extended signal memory as temporary storage. Now in such cases, extended signal memory is not used in this fashion. (Bug #32195561)
When ClassicFragmentation= 1, the default number of partitions per node (shown in ndb_desc output as PartitionCount) is calculated using the lowest number of LDM threads employed by any single live node, and was done only once, even after data nodes left or joined the cluster, possibly with a new configuration changing the LDM thread count and thus the default partition count. Now in such cases, we make sure the default number of partitions per node is recalculated each time data nodes join or leave the cluster.
This is not an issue in NDB 8.0.23 and later, when ClassicFragmentation is set to 0. (Bug #32183985)
The internal function Ndb_ReloadHWInfo() is responsible for updating hardware information for all the CPUs on the host. For the Linux ARM platform, which does not have Level 3 cache information, this assigned a socket ID for the L3 cache ID but failed to record the value for the global variable num_shared_l3_caches, which is needed when creating lists of CPUs connected to a shared L3 cache. (Bug #32180383)
When trying to run two management nodes on the same host and using the same port number, it was not always obvious to users why they did not start. Now in such cases, in addition to writing a message to the error log, an error message Same port number is specified for management nodes node_id1 and node_id2 (or) they both are using the default port number on same host host_name is also written to the console, making the source of the issue more immediately apparent. (Bug #32175157)
Added a --cluster-config-suffix option for ndb_mgmd and ndb_config, for use in internal testing to override a defaults group suffix. (Bug #32157276)
The management server returned the wrong status for host name matching when some of the host names in configuration did not resolve and client trying to allocate a node ID connected from the host whose host name resolved to a loopback address with the error Could not alloc node id at <host>:<port>: Connection with id X done from wrong host ip 127.0.0.1, expected <unresolvable_host> (lookup failed).
This caused the connecting client to fail the node ID allocation.
This issue is fixed by rewriting the internal match_hostname() function so that it contains all logic for how the requesting client address should match the configured hostnames, and so that it first checks whether the configured host name can be resolved; if not, it now returns a special error so that the client receives an error indicating that node ID allocation can be retried. The new error is Could not alloc node id at <host>:<port>: No configured host found of node type <type> for connection from ip 127.0.0.1. Some hostnames are currently unresolvable. Can be retried. (Bug #32136993)
The internal function ndb_socket_create_dual_stack() did not close a newly created socket when a call to ndb_setsockopt() was unsuccessful. (Bug #32105957)
The local checkpoint (LCP) mechanism was changed in NDB 7.6 such that it also detected idle fragments—that is, fragments which had not changed since the last LCP and thus required no on-disk metadata update. The LCP mechanism could then immediately proceed to handle the next fragment. When there were a great many such idle fragments, the CPU consumption required merely to loop through these became highly significant, causing latency spikes in user transactions.
A 1 ms delay was already inserted between each such idle fragment being handled. Testing later showed this to be too short an interval, and that we are normally not in as great a hurry to complete these idle fragments as we previously believed.
This fix extends the idle fragment delay time to 20 ms if there are no redo alerts indicating an urgent need to complete the LCP. In case of a low redo alert state we wait 5 ms instead, and for a higher alert state we fall back to the 1 ms delay. (Bug #32068551)
References: See also: Bug #31655158, Bug #31613158.
When an NDB table was created, it was invalidated in the global dictionary cache, but this was unnecessary. Furthermore, having a table which exists in the global dictionary cache is actually an advantage for subsequent uses of the new table, since it can be found in the table cache without performing a round trip to NDB. (Bug #32047456)
No clear error message was provided when an ndb_mgmd process tried to start using the PortNumber of a port that was already in use. (Bug #32045786)
Two problems occurred when NDB closed a table:
- NDB failed to detect when the close was done from FLUSH TABLES, which meant that the NDB table definitions in the global dictionary cache were not invalidated.
- When the close was done by a thread which had not used NDB earlier—for example when FLUSH TABLES or RESET MASTER closed instances of ha_ndbcluster held in the table definition cache—a new Thd_ndb object was allocated, even though there is a fallback to the global Ndb object in case the allocation fails, which never occurs in such cases, so it is less wasteful simply to use the global object already provided.
(Bug #32018394, Bug #32357856)
Removed a large number of compiler warnings relating to unused function arguments in NdbDictionaryImpl. (Bug #31960757)
Unnecessary casts were performed when checking internal error codes. (Bug #31930166)
NDB continued to use file system paths for determining the names of tables to open or perform DDL on, in spite of the fact that it longer actually uses files for these operations. This required unnecessary translation between character sets, handling the MySQL-specific file system encoding, and parsing. In addition, results of these operations were stored in buffers of fixed size, each instance of which used several hundred bytes of memory unnecessarily. Since the database and table names to use are already available to NDB through other means, this translation could be (and has been) removed in most cases. (Bug #31846478)
Generation of internal statistics relating to NDB object counts was found to lead to an increase in transaction latency at very high rates of transactions per second, brought about by returning an excessive number of freed NDB objects. (Bug #31790329)
NDB behaved unpredictably in response an attempt to change permissions on a distributed user (that is, a user having the NDB_STORED_USER privilege) during a binary log thread shutdown and restart. We address this issue by ensuring that the user gets a clear warning Could not distribute ACL change to other MySQL servers whenever distribution does not succeed. This fix also improves a number of mysqld log messages. (Bug #31680765)
ndb_restore encountered intermittent errors while replaying backup logs which deleted blob values; this was due to deletion of blob parts when a main table row containing blob one or more values was deleted. This is fixed by modifying ndb_restore to use the asynchronous API for blob deletes, which does not trigger blob part deletes when a blob main table row is deleted (unlike the synchronous API), so that a delete log event for the main table deletes only the row from the main table. (Bug #31546136)
When a table creation schema transaction is prepared, the table is in TS_CREATING state, and is changed to TS_ACTIVE state when the schema transaction commits on the DBDIH block. In the case where the node acting as DBDIH coordinator fails while the schema transaction is committing, another node starts taking over for the coordinator. The following actions are taken when handling this node failure:
- DBDICT rolls the table creation schema transaction forward and commits, resulting in the table involved changing to TS_ACTIVE state.
- DBDIH starts removing the failed node from tables by moving active table replicas on the failed node from a list of stored fragment replicas to another list.
These actions are performed asynchronously many times, and when interleaving may cause a race condition. As a result, the replica list in which the replica of a failed node resides becomes nondeterministic and may differ between the recovering node (that is, the new coordinator) and other DIH participant nodes. This difference violated a requirement for knowing which list the failed node's replicas can be found during the recovery of the failed node recovery on the other participants.
To fix this, moving active table replicas now covers not only tables in TS_ACTIVE state, but those in TS_CREATING (prepared) state as well, since the prepared schema transaction is always rolled forward.
In addition, the state of a table creation schema transaction which is being aborted is now changed from TS_CREATING or TS_IDLE to TS_DROPPING, to avoid any race condition there. (Bug #30521812)
START BACKUP SNAPSHOTSTART WAIT STARTED could return control to the user prior to the backup's restore point from the user point of view; that is the Backup started notification was sent before waiting for the synchronising global checkpoint (GCP) boundary. This meant that transactions committed after receiving the notification might be included in the restored data.
To fix this problem, START BACKUP now sends a notification to the client that the backup has been started only after the GCP has truly started. (Bug #29344262)
Upgrading to NDB Cluster 8.0 from a prior release includes an upgrade in the schema distribution mechanism, as part of which the ndb_schema table is dropped and recreated in a way which causes all MySQL Servers connected to the cluster to restart their binary log injector threads, causing a gap event to be written to the binary log. Since the thread restart happens at the same time on all MySQL Servers, no binary log spans the time during which the schema distribution functionality upgrade was performed, which breaks NDB Cluster Replication.
This issue is fixed by adding support for gracefully reconstituting the schema distribution tables while allowing the injector thread to continue processing changes from the cluster. This is implemented by handling the DDL event notification for DROP TABLE to turn off support for schema distribution temporarily, and to start regular checks to re-create the tables. When the tables have been successfully created again, the regular checks are turned off and support for schema distribution is turned back on.
NDB also now detects automatically when the ndb_apply_status table has been dropped and re-creates it. The drop and re-creation leaves a gap event in the binary log, which in a replication setup causes the replica MySQL Server to stop applying changes from the source until the replication channel is restarted (see ndb_apply_status Table).
In addition, the minimum version required to perform the schema distribution upgrade is raised to 8.0.24, which prevents automatic triggering of the schema distribution upgrade until all connected API nodes support the new upgrade procedure.
For more information, see NDB Cluster Replication Schema and Tables. (Bug #27697409, Bug #30877233)
References: See also: Bug #30876990.
Fixed a number of issues uncovered when trying to build NDB with GCC 6. (Bug #25038373)
Calculation of the redo alert state based on redo log usage was overly aggressive, and thus incorrect, when using more than 1 log part per LDM.