MySQL NDB Cluster 7.6 Release Notes
Microsoft Windows: NDB Cluster did not compile correctly using Visual Studio 2022. (Bug #35967676)
NDB Cluster did not compile correctly on Ubuntu 23.10. (Bug #35847193)
When a node failure is detected, transaction coordinator (TC)
instances check their own transactions to determine whether they
need handling to ensure completion, implemented by checking
whether each transaction involves the failed node, and if so,
marking it for immediate timeout handling. This causes the
transaction to be either rolled forward (commit) or back
(abort), depending on whether it had started committing, using
the serial commit protocol. When the TC was in the process of
getting permission to commit
(CS_PREPARE_TO_COMMIT
), sending commit
requests (CS_COMMITTING
), or sending
completion requests (CS_COMPLETING
), timeout
handling waited until the transaction was in a stable state
before commencing the serial commit protocol.
Prior to the fix for Bug#22602898, all timeouts during
CS_COMPLETING
or
CS_COMMITTING
resulted in switching to the
serial commit-complete protocol, so skipping the handling in any
of the three states cited previously did not stop the prompt
handling of the node failure. It was found later that this fix
removed the blanket use of the serial commit-complete protocol
for commit-complete timeouts, so that when handling for these
states was skipped, no node failure handling action was taken,
with the result that such transactions hung in a commit or
complete phase, blocking checkpoints.
The fix for Bug#22602898 removed this stable state handling to
avoid it accidentally triggering, but this change also stopped
it from triggering when needed in this case where node failure
handling found a transaction in a transient state. We solve this
problem by modifying CS_COMMIT_SENT
and
CS_COMPLETE_SENT
stable state handling to
perform node failure processing if a timeout has occurred for a
transaction with a failure number different from the current
latest failure number, ensuring that all transactions involving
the failed node are in fact eventually handled.
(Bug #36028828)
References: See also: Bug #22602898.
It was possible for the readln_socket()
function in
storage/ndb/src/common/util/socket_io.cpp
to read one character too many from the buffer passed to it as
an argument.
(Bug #35857936)
The slow disconnection of a data node while a management server
was unavailable could sometimes interfere with the rolling
restart process. This became especially apparent when the
cluster was hosted by NDB Operator, and the old
mgmd
pod did not recognize the IP address
change of the restarted data node pod; this was visible as
discrepancies in the output of SHOW
STATUS
on different management nodes.
We fix this by making sure to clear any cached address when connecting to a data node so that the data node's new address (if any) is used instead. (Bug #35667611)
The maximum permissible value for the oldest restorable global
checkpoint ID is MAX_INT32
(4294967295). Such
an ID greater than this value causes the data node to shut down,
requiring a backup and restore on a cluster started with
--initial
.
Now, approximately 90 days before this limit is reached under normal usage, an appropriate warning is issued, allowing time to plan the required corrective action. (Bug #35641420)
References: See also: Bug #35749589.
Subscription reports were sent out too early by
SUMA
during a node restart,
which could lead to schema inconsistencies between cluster SQL
nodes. In addition, an issue with the ndbinfo
restart_info
table meant that
restart phases for nodes that did not belong to any node group
were not always reported correctly.
(Bug #30930132)
Online table reorganization inserts rows from existing table
fragments into new table fragments; then, after committing the
inserted rows, it deletes the original rows. It was found that
the inserts caused SUMA
triggers to fire, and binary logging to occur, which led to the
following issues:
Inconsistent behavior, since DDL is generally logged as one or more statements, if at all, rather than by row-level effect.
It was incorrect, since only writes were logged, but not deletes.
It was unsafe since tables with blobs did not receive associated the row changes required to form valid binary log events.
It used CPU and other resources needlessly.
For tables with no blob columns, this was primarily a performance issue; for tables having blob columns, it was possible for this behavior to result in unplanned shutdowns of mysqld processes performing binary logging and perhaps even data corruption downstream. (Bug #19912988)
References: See also: Bug #16028096, Bug #34843617.