MySQL NDB Cluster 7.6 Release Notes
NDB Cluster APIs:
Ndb::pollEvents2()
did not set
NDB_FAILURE_GCI
(~(Uint64)0
) to indicate cluster failure.
(Bug #35671818)
References: See also: Bug #31926584. This issue is a regression of: Bug #18753887.
NDB Client Programs: When ndb_select_all failed to read all data from the table, it always tried to re-read it. This could lead to the two problems listed here:
Returning a non-empty partial result eventually led to spurious reports of duplicate rows.
The table header was printed on every retry.
Now when ndb_select_all is unsuccessful at reading all the table data, its behavior is as follows:
When the result is non-empty, ndb_select_all halts with an error (and does not retry the scan of the table).
When the result is empty, ndb_select_all retries the scan, reusing the old header.
(Bug #35510814)
Following a node connection failure, the transporter registry's error state was not cleared before initiating a reconnect, which meant that the error causing the connection to be disconnected originally might still be set; this was interpreted as a failure to reconnect. (Bug #35774109)
When a TransporterRegistry
(TR) instance
connects to a management server, it first uses the MGM API, and
then converts the connection to a Transporter
connection for further communication. The initial connection had
an excessively long timeout (60 seconds) so that, in the case of
a cluster having two management servers where one was
unavailable, clients were forced to wait until this management
server timed out before being able to connect to the available
one.
We fix this by setting the MGM API connection timeout to 5000 milliseconds, which is equal to the timeout used by the TR for getting and setting dynamic ports. (Bug #35714466)
Values for causes of conflicts used in conflict resolution
exceptions tables were misaligned such that the order of
ROW_ALREADY_EXISTS
and
ROW_DOES_NOT_EXIST
was reversed.
(Bug #35708719)
In cases where the distributed global checkpoint (GCP) protocol
stops making progress, this is detected and optionally handled
by the GCP monitor, with handling as determined by the
TimeBetweenEpochsTimeout
and
TimeBetweenGlobalCheckpointsTimeout
data node parameters.
The LCP protocol is mostly node-local, but depends on the progress of the GCP protocol at the end of a local checkpoint (LCP); this means that, if the GCP protocol stalls, LCPs may also stall in this state. If the LCP watchdog detects that the LCP is stalled in this end state, it should defer to the GCP monitor to handle this situation, since the GCP Monitor is distribution-aware.
If no GCP monitor limit is set
(TimeBetweenEpochsTimeout
is equal 0), no
handling of GCP stalls is performed by the GCP monitor. In this
case, the LCP watchdog was still taking action which could
eventually lead to cluster failure; this fix corrects this
misbehavior so that the LCP watchdog no longer takes any such
action.
(Bug #29885899)
Previously, when a timeout was detected during transaction
commit and completion, the transaction coordinator (TC) switched
to a serial commit-complete execution protocol, which slowed
commit-complete processing for large transactions, affecting
GCP_COMMIT
delays and epoch sizes. Instead of
switching in such cases, the TC now continues waiting for
parallel commit-complete, periodically logging a transaction
summary, with states and nodes involved.
(Bug #22602898)
References: See also: Bug #35260944.