11 Changes in MySQL NDB Cluster 7.6.28 (5.7.44-ndb-7.6.28) (2023-10-26, General Availability)

MySQL NDB Cluster 7.6.28 is a new release of NDB 7.6, based on MySQL Server 5.7 and including features in version 7.6 of the NDB storage engine, as well as fixing recently discovered bugs in previous NDB Cluster releases.

Obtaining NDB Cluster 7.6. NDB Cluster 7.6 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.

For an overview of changes made in NDB Cluster 7.6, see What is New in NDB Cluster 7.6.

This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 5.7 through MySQL 5.7.44 (see Changes in MySQL 5.7.44 (2023-10-25, General Availability)).

Bugs Fixed

InnoDB: The last detected deadlock section of the engine status log was only showing 1024 characters for the combined thread and query information. Fixing by removing the printed query string limit. (Bug #80927, Bug #23036096)
NDB Replication: Updates to primary keys of character types were not correctly represented in the BEFORE and AFTER trigger values sent to the NDB binary log injector. This issue was previously fixed in part, but it was discovered subsequently that the problem still occurred when the mysqld was run with the binary logging options having the values listed here:
- --ndb-log-update-minimal=ON
- --ndb-log-update-as-write=OFF
The minimal binary log format excluded all primary key columns from the AFTER values reflecting the updated row, the rationale for this being a flawed assumption that the primary key remained constant when an update trigger was received. This did not take into account the fact that, if the primary key uses a character data type, an update trigger is received if character columns are updated to values treated as equal by the comparison rules of the collation used.
To be able to replicate such changes, we need to include them in the AFTER values; this fix ensures that we do so. (Bug #34540016)
References: See also: Bug #27522732, Bug #34312769, Bug #34388068.
NDB Cluster APIs: Ndb::pollEvents2() did not set NDB_FAILURE_GCI (~(Uint64)0) to indicate cluster failure. (Bug #35671818)
References: See also: Bug #31926584. This issue is a regression of: Bug #18753887.
NDB Client Programs: When ndb_select_all failed to read all data from the table, it always tried to re-read it. This could lead to the two problems listed here:
- Returning a non-empty partial result eventually led to spurious reports of duplicate rows.
- The table header was printed on every retry.
Now when ndb_select_all is unsuccessful at reading all the table data, its behavior is as follows:
- When the result is non-empty, ndb_select_all halts with an error (and does not retry the scan of the table).
- When the result is empty, ndb_select_all retries the scan, reusing the old header.
(Bug #35510814)
Following a node connection failure, the transporter registry's error state was not cleared before initiating a reconnect, which meant that the error causing the connection to be disconnected originally might still be set; this was interpreted as a failure to reconnect. (Bug #35774109)
When a TransporterRegistry (TR) instance connects to a management server, it first uses the MGM API, and then converts the connection to a Transporter connection for further communication. The initial connection had an excessively long timeout (60 seconds) so that, in the case of a cluster having two management servers where one was unavailable, clients were forced to wait until this management server timed out before being able to connect to the available one.
We fix this by setting the MGM API connection timeout to 5000 milliseconds, which is equal to the timeout used by the TR for getting and setting dynamic ports. (Bug #35714466)
Values for causes of conflicts used in conflict resolution exceptions tables were misaligned such that the order of ROW_ALREADY_EXISTS and ROW_DOES_NOT_EXIST was reversed. (Bug #35708719)
Improved NDBFS debugging output for bad requests. (Bug #35500304)
References: This issue is a regression of: Bug #28922609.
When other events led to NDBFS dumping requests to the log, some of the names of the request types were printed as Unknown action. (Bug #35499931)
ndb_restore did not update compare-as-equal primary key values changed during backup. (Bug #35420131)
In cases where the distributed global checkpoint (GCP) protocol stops making progress, this is detected and optionally handled by the GCP monitor, with handling as determined by the TimeBetweenEpochsTimeout and TimeBetweenGlobalCheckpointsTimeout data node parameters.
The LCP protocol is mostly node-local, but depends on the progress of the GCP protocol at the end of a local checkpoint (LCP); this means that, if the GCP protocol stalls, LCPs may also stall in this state. If the LCP watchdog detects that the LCP is stalled in this end state, it should defer to the GCP monitor to handle this situation, since the GCP Monitor is distribution-aware.
If no GCP monitor limit is set (TimeBetweenEpochsTimeout is equal 0), no handling of GCP stalls is performed by the GCP monitor. In this case, the LCP watchdog was still taking action which could eventually lead to cluster failure; this fix corrects this misbehavior so that the LCP watchdog no longer takes any such action. (Bug #29885899)
Previously, when a timeout was detected during transaction commit and completion, the transaction coordinator (TC) switched to a serial commit-complete execution protocol, which slowed commit-complete processing for large transactions, affecting GCP_COMMIT delays and epoch sizes. Instead of switching in such cases, the TC now continues waiting for parallel commit-complete, periodically logging a transaction summary, with states and nodes involved. (Bug #22602898)
References: See also: Bug #35260944.