MySQL NDB Cluster 8.0 Release Notes
This release is no longer available for download. It was
removed due to a critical issue that could cause data in
InnoDB
tables having added
columns to be interpreted incorrectly. Please upgrade to MySQL
Cluster 8.0.30 instead.
NDB
could not be built using GCC 11 due to an
array out of bounds error.
(Bug #33459671)
Removed a number of -Wstringop-truncation
warnings raised when compiling NDB
with GCC 9
as well as suppression of such warnings. Also removed unneeded
includes from the header file ndb_global.h
.
(Bug #32233543)
Eight new tables providing NDB
dictionary
information about database objects have been added to the
ndbinfo
information database.
This makes it possible to obtain a great deal of information of
this type by issuing queries in the mysql
client, without the need to use ndb_desc,
ndb_select_all, and similar utilities. (It is
still be necessary to use ndb_desc to obtain
fragment distribution information.) These tables are listed
here, together with the NDB
objects about
which they provide information:
blobs
: Blob tables
dictionary_columns
: Table
columns
dictionary_tables
: Tables
events
: Event
subscriptions
files
: Files used by disk
data tables
foreign_keys
: Foreign keys
hash_maps
: Hash maps
index_columns
: Table
indexes
An additional change in ndbinfo
is that only
files
and hash_maps
are
defined as views; the remaining six tables listed previously are
in fact base tables, even though they are not named using the
ndb$
prefix. As a result, these tables are
not hidden as other ndbinfo
base tables are.
For more information, see the descriptions of the tables in ndbinfo: The NDB Cluster Information Database. (WL #11968)
ndbcluster
plugin threads can now be seen in
the Performance Schema. The threads
and setup_threads
tables show all
three of these threads: the binary logging thread
(ndb_binlog
thread), the index statistics
thread (ndb_index_stat
thread), and the
metadata thread (ndb_metadata
thread).
This makes it possible to obtain the thread IDs and thread OS IDs of these threads for use in queries on these and other Performance Schema tables.
For more information and examples, see ndbcluster Plugin Threads. (WL #15000)
NDB Cluster APIs:
The NDB API now implements a
List::clear()
method which
clears all data from a list. This makes it simpler to reuse an
existing list with the Dictionary methods
listEvents()
,
listIndexes()
, and
listObjects()
.
In addition, the List
destructor has been modified such that it now calls
clear()
before attempting
the removal of any elements or attributes from the list being
destroyed.
(Bug #33676070)
The client receive thread was enabled only when under high load,
where the criterion for determining “high load” was
that the number of clients waiting in the poll queue (the
receive queue) was greater than
min_active_clients_recv_thread
(default:
8
).
This was a poor metric for determining high load, since a single
client, such as the binary log injector thread handling incoming
replication events, could experience high load on its own as
well. The same was true of a pushed join query (in which very
large batches of incoming TRANSID_AI
signals
are received).
We change the receive thread such that it now sleeps in the poll queue rather than being deactivated completely, so that it is now always available for handling incoming signals, even when the client is not under high load. (Bug #33752914)
It is now possible to restore the
ndb_apply_status
table from an
NDB
backup, using
ndb_restore with the
--with-apply-status
option
added in this release. In some cases, this information can be
useful in new setting up new replication links.
--with-apply-status
restores all rows of the
ndb_apply_status
table except for the row for
which the server_id
value is
0
; use
--restore-epoch
to restore
this row.
To use the --with-apply-status
option, you must
also supply --restore-data
when invoking ndb_restore.
For more information, see the description of the
--with-apply-status
option in the Reference
Manual, as well as
ndb_apply_status Table.
(Bug #32604161, Bug #33594652)
Previously, when a user query attempted to open an
NDB
table with a missing (or broken) index,
the MySQL server raised NDB
error
4243
Index not
found. Now when such an attempt is made, it is
handled as described here:
If the query does not make use of the problematic index, the query succeeds with no errors or warnings.
If the query attempts to use the missing or broken index,
the query is rejected with a warning from
NDB
(Index
idx
is not available in NDB. Use
"ALTER TABLE tbl
ALTER INDEX
idx
INVISIBLE" to prevent MySQL
from attempting to access it, or use "ndb_restore
--rebuild-indexes" to rebuild it), and an error
(ER_NOT_KEYFILE
).
The rationale for this change is that constraint violations or
missing data sometimes make it impossible to restore an index on
an NDB
table, in which case, running
ndb_restore with
--disable-indexes
restores
the data without the index. With this change, once the data is
restored from backup, it is possible to use SQL to fix any
corrupt data and rebuild the index.
(Bug #28584066, WL #14867)
Important Change:
The maximum value supported for the
--ndb-batch-size
server option
has been increased from 31536000
to
2147483648
(2 GB).
(Bug #21040523)
Performance:
When profiling multithreaded data nodes
(ndbmtd) performing a transaction including a
large number of inserts, it was found that more than 50% of CPU
time was spent in the internal method
Dblqh::findTransaction()
. It was found that,
when there were many operations belonging to uncommitted
transactions in the hash list searched by this method, the hash
buckets overfilled, the result being that an excessive number of
CPU cycles were consumed searching through the hash buckets.
To address this problem, we fix the number of hash buckets at 4095, and scale the size of a hash bucket relative to the maximum number of operations, so that only relatively few items should now be placed in the same bucket. (Bug #33803541)
References: See also: Bug #33803487.
Performance:
When inserting a great many rows into an empty or small table in
the same transaction, the rate at which rows were inserted
quickly declined to less than 50% of the initial rate;
subsequently, it was found that roughly 50% of all CPU time was
spent in Dbacc::getElement()
, and the root
cause identified to be the timing of resizing the structures
used for storing elements by
DBACC
, growing with the
insertion of more rows in the same transaction, and shrinking
following a commit.
We fix this issue by checking for a need to resize immediately following the insertion or deletion of an element. This also handles the subsequent rejection of an insert. (Bug #33803487)
References: See also: Bug #33803541.
Performance:
A considerable amount of time was being spent searching the
event buffer data hash (using the internal method
EventBufData_hash::search()
), due to the
following issues:
The number of buckets proved to be too low under high load, when the hash bucket list could become very large.
The hash buckets were implemented using a linked list. Traversing a long linked list can be highly inefficient.
We fix these problems by using a vector
(std::vector
) rather than a linked list, and
by making the array containing the set of hash buckets
expandable.
(Bug #33796754)
Performance:
The internal function computeXorChecksum()
was implemented such that great care was taken to aid the
compiler in generating optimal code, but it was found that it
consumed excessive CPU resources, and did not perform as well as
a simpler implementation. This function is now reimplemented
with a loop summing up XOR
results
over an array, which appears to result in better optimization
with both GCC and Clang compilers.
(Bug #33757412)
Microsoft Windows:
The CompressedLCP
data
node configuration parameter had no effect on Windows platforms.
When upgrading to this release, Windows users should verify
the setting for CompressedLCP
; if it was
previously enabled, you may experience an increase in CPU
usage by I/O threads following the upgrade, when under load,
when restoring data as part of a node restart, or in both
cases. If this behavior is not desired, disable
CompressedLCP
.
(Bug #33727690)
Microsoft Windows:
The internal function
Win32AsyncFile::rmrfReq()
did not always
check for both ERROR_FILE_NOT_FOUND and
ERROR_PATH_NOT_FOUND when either
condition was likely.
(Bug #33727647)
Microsoft Windows: Corrected several minor issues that occurred with file handling on Windows platforms. (Bug #33727629)
NDB Cluster APIs:
Hash key generation using the internal API method
NdbBlob::getBlobKeyHash()
ignored the most
significant byte of the key. This unnecessarily caused uneven
distribution in the NDB API blob hash list, resulting in a
increased need for comparing key values, and thus more CPU
usage.
(Bug #33803583)
References: See also: Bug #33783274.
NDB Cluster APIs:
Removed an unnecessary assertion that could be hit when
iterating through the list returned by
Dictionary::listEvents()
.
(Bug #33630835)
Builds on Ubuntu 21.10 using GCC 11 stopped with -Werror=maybe-uninitialized. (Bug #33976268)
In certain cases, NDB
did not handle node IDs
of data nodes correctly.
(Bug #33916404)
In some cases, NDB
did not validate all node
IDs of data nodes correctly.
(Bug #33896409)
In some cases, array indexes were not handled correctly. (Bug #33896389, Bug #33896399, Bug #33916134)
In some cases, integers were not handled correctly. (Bug #33896356)
As part of work done in NDB 8.0.23 to implement the
AutomaticThreadConfig
configuration parameter, the maximum numbers of LQH and TC
threads supported by ndbmtd were raised from
129 each to 332 and 160, respectively. This adversely affected
the performance of execSEND_PACKED()
methods
implemented by several NDB kernel blocks, which complete sending
of packed signals when the scheduler is about to suspend
execution of the current block thread. This was due to
continuing simply to iterate over the arrays of such threads
despite the arrays' increased size. We fix this by using a
bitmask to track the thread states alongside the full arrays.
(Bug #33856371)
When operating on blob columns, NDB
must add
extra operations to read and write the blob head column and blob
part rows. These operations are added to the tail of the
transaction's operation list automatically when the transaction
is executed.
To insert a new operation prior to a given operation, it was
necessary to traverse the operation list from the beginning
until the desired operation was found, with a cost proportional
to the length L
of the list of
preceding operations. This is approximately
, increasing as more operations are added to the
list; when a large number of operations modifying blobs were
defined in a batch, this traversal cost was paid for each
operation. This had a noticeable impact on performance when
reading and writing blobs.
L
2
/ 2
We fix this by using list splicing in
NdbTransaction::execute()
to
eliminate unnecessary traversals of this sort when defining blob
operations.
(Bug #33797931)
The block thread scheduler makes frequent calls to
update_sched_config()
to update its
scheduling strategy. That involves checking the fill degree of
the job buffer queues used to send signals between the nodes'
internal block threads. When these queues are about to fill up,
the thread scheduler assigns a smaller value to
max_signals
for the next round, in order to
reduce the pressure on the job buffers. When the minimum free
threshold has been reached, the scheduler yields the CPU while
waiting for the consumer threads to free some job buffer slots.
The fix in NDB 8.0.18 for a previous issue introduced a mechanism whereby the main thread was allowed to continue executing even when this lower threshold had been reached; in some cases the main thread consumed all job buffers, including those held in reserve, leading to an unplanned shutdown of the data node due to resource exhaustion. (Bug #33792362, Bug #33872577)
References: This issue is a regression of: Bug #29887068.
Setting up a cluster with one LDM thread and one query thread
using the ThreadConfig
parameter (for example,
ThreadConfig=ldm={cpubind=1},query={cpubind=2}
)
led to unplanned shutdowns of data nodes.
This was due to internal thread variables being assigned the wrong values when there were no main or request threads explicitly assigned. Now we make sure in such cases that these are assigned the thread number of the first receive thread, as expected. (Bug #33791270)
NdbEventBuffer
hash key generation for
non-character data reused the same 256 hash keys; in addition,
strings of zero length were ignored when calculating hash keys.
(Bug #33783274)
The collection of NDB API statistics based on the
EventBytesRecvdCount
event counter incurred
excessive overhead. Now this counter is updated using a value
which is aggregated as the event buffer is filled, rather than
traversing all of the event buffer data in a separate function
call.
For more information, see NDB API Statistics Counters and Variables. (Bug #33778923)
The internal method
THRConfig::reorganize_ldm_bindings()
behaved
unexpectedly, in some cases changing thread bindings after
AutomaticThreadConfig
had already bound the threads to the correct CPUs. We fix this
by removing the method, no longer using it when parsing
configuration data or adding threads.
(Bug #33764260)
The receiver thread ID was hard-coded in the internal method
TransporterFacade::raise_thread_prio()
such
that it always acted to raise the priority of the receiver
thread, even when called from the send thread.
(Bug #33752983)
A fix in NDB 8.0.28 addressed an issue with the code used by
various NDB
components, including
Ndb_index_stat
, that checked whether the data
nodes were up and running. In clusters with multiple SQL nodes,
this resulted in an increase in the frequency of race conditions
between index statistics threads trying to create a table event
on the ndb_index_stat_head
table; that is, it
was possible for two SQL nodes to try to create the event at the
same time, with the losing SQL node raising Error 746
Event name already exists. Due to this
error, the binary logging thread ended up waiting for the index
statistics thread to signal that its own setup was complete, and
so the second SQL node timed out with Could not
create index stat system tables after
--ndb-wait-setup
seconds.
(Bug #33728909)
References: This issue is a regression of: Bug #32019119.
On a write error, the message printed by ndbxfrm referenced the source file rather than the destination file. (Bug #33727551)
A complex nested join was rejected with the error
FirstInner/Upper has to be an ancestor or a
sibling, which is thrown by the internal
NdbQueryOperation
interface used to define a
pushed join in the SPJ API, indicating that the join-nest
dependencies for the interface were not properly defined.
The query showing the issue had the join nest structure
t2, t1, (t3, (t5, t4))
. Neither of the join
conditions on t5
or t4
had
any references or explicit dependencies on table
t3
, but each had an implicit dependency on
t3
in virtue of being in a nest within the
same nest as t3
.
When preparing a pushed join, NDB
tracks all
required table dependencies between tables and join-nests by
adding them to the m_ancestor
bitmask for
each table. For nest level dependencies, they should all be
added to the first table in the relevant nest. When the relevant
dependencies for a specific table are calculated, they include
the set of all tables being explicitly refered in the join
condition, plus any implicit dependencies due to the join nests
the table is a member of, limited by the uppermost table
referred to in the join condition.
For this particular join query we did not properly take into
account that there might not be any references to tables in the
closest upper nest (the nest starting with
t3
); in such cases we are dependent on all
nests up to the nest containing the uppermost table referenced.
We fix the issue by introducing a while-loop in which we add
ancestor nest dependencies until we reach this uppermost table.
(Bug #33670002)
When the transient memory pool
(TransientPool
) used internally by
NDB
grew above 256 MB, subsequent attempts to
shrink the pool caused an error which eventually led to an
unplanned shutdown of the data node.
(Bug #33647601)
Check that the connection to NDB
has been set
up before querying about statistics for partitions.
(Bug #33643512)
When the ordered index PRIMARY
was not
created for the ndb_sql_metadata
table,
application of stored grants could not proceed due to the
missing index.
We fix this by protecting creation of utility tables (including
ndb_sql_metadata) by wrapping the associated
CREATE TABLE
statement with a
schema transaction, thus handling rejection of the statement by
rollback. In addition, in the event the newly-created table is
not created correctly, it is dropped. These changes avoid
leaving behind a table that is only partially created, so that
the next attempt to create the utility table starts from the
beginning of the process.
(Bug #33634453)
Removed -Wmaybe-uninitialized
warnings which
occurred when compiling NDB Cluster with GCC 11.2.
(Bug #33611915)
NDB
accepted an arbitrary (and invalid)
string of characters following a numeric parameter value in the
config.ini
global configuration. For
example, it was possible to use either OverloadLimit=10
"M12L"
or OverloadLimit=10 M
(which
contains a space) and have it interpreted as
OverloadLimit=10M
.
It was also possible to use a bare letter suffix in place of an
expected numeric value, such as
OverloadLimit=M
, and have it interpreted as
zero. This happened as well with an arbitrary string whose first
letter was one of the MySQL standard modifiers
K
, M
, or
G
; thus,
OverloadLimit=MAX_UINT
also had the effect of
setting OverloadLimit
to zero.
Now, only one of the suffixes K
,
M
, or G
is accepted with a
numeric parameter value, and it must follow the numeric value
immediately, with no intervening whitespace characters or
quotation marks. In other words, to set
OverloadLimit
to 10 megabytes, you must use
one of OverloadLimit=10000000
,
OverloadLimit=10M
, or
OverloadLimit=10000K
.
To maintain availability, you should check your
config.ini
file for any settings that do
not conform to the rule enforced as a result of this change
and correct them prior to upgrading. Otherwise, the cluster
may not be able to start afterwards, until you rectify the
issue.
(Bug #33589961)
Enabling
AutomaticThreadConfig
with fewer than 8 CPUs available led to unplanned shutdowns of
data nodes.
(Bug #33588734)
Removed the unused source files buddy.cpp
and buddy.hpp
from
storage/ndb/src/common/transporter/
.
(Bug #33575155)
The NDB
stored grants mechanism now sets the
session variable
print_identified_with_as_hex
to
true
, so that password hashes stored in the
ndb_sql_metadata
table are formatted as
hexadecimal values rather than being formatted as strings.
(Bug #33542052)
Binary log thread event handling includes optional
high-verbosity logging, which, when enabled and the connection
to NDB
lost, produces an excess of log
messages like these:
datetime
2 [Note] [MY-010866] [Server] NDB Binlog: cluster failure for epoch 55/0.datetime
2 [Note] [MY-010866] [Server] NDB Binlog: cluster failure for epoch 55/0.
Such repeated log messages, not being of much help in diagnosing errors, have been removed. This leaves a similar log message in such cases, from the handling of schema distribution event operation teardown. (Bug #33492244)
Historically, a number of different methods have been used to
enforce compile-time checks of various interdependencies and
assumptions in the NDB
codebase in a portable
way. Since the standard static_assert()
function is now always available, the
NDB_STATIC_ASSERT
and
STATIC_ASSERT
macros have been replaced with
direct usage of static_assert()
.
(Bug #33466577)
When the internal AbstractQueryPlan
interface
determined the access type to be used for a specific table, it
tried to work around an optimizer problem where the
ref
access type was specified for a table and
later turned out to be accessible by eq_ref
.
The workaround introduced a new issue by sometimes determining
eq_ref
access for a table actually needing
ref
access; in addition, the prior fix did
not take into account UNIQUE USING HASH
indexes, which need either eq_ref
or full
table scan access, even when the MySQL Optimizer regards it as a
ref
access.
We fix this by first removing the workaround (which had been
made obsolete by the proper fix for the previous issue), and
then by introducing the setting of eq_ref
or
full_table_scan
access for hash indexes.
(Bug #33451256)
References: This issue is a regression of: Bug #28965762.
When a pushed join is prepared but not executed, the
Ndb_pushed_queries_dropped
status variable is incremented. Now, in addition to this,
NDB
now emits a warning Prepared
pushed join could not be executed... which is passed
to ER_GET_ERRMSG
.
(Bug #33449000)
The deprecated -r
option for
ndbd has been removed. In addition, this
change also removes extraneous text from the output of
ndbd --help
.
(Bug #33362935)
References: See also: Bug #31565810.
ndb_import sometimes could not parse
correctly a .csv
file containing
Windows/DOS-style (\r\n
) linefeeds.
(Bug #32006725)
The ndb_import tool handled only the hidden
primary key which is defined by NDB
when a
table does not have an explicit primary key. This caused an
error when inserting a row containing NULL
for an auto-increment primary key column, even though the same
row was accepted by
LOAD DATA
INFILE
.
We fix this by adding support for importing a table with one or
more instances of NULL
in an auto-increment
primary key column. This includes a check that a table has no
more than one auto-increment column; if this column is nullable,
it is redefined by ndb_import as NOT
NULL
, and any occurrence of NULL
in
this column is replaced by a generated auto-increment value
before inserting the row into NDB
.
(Bug #30799495)
When a node failure is detected, surviving nodes in the same nodegroup as this node attempt to resend any buffered change data to event subscribers. In cases in which there were no outstanding epoch deliveries, that is, the list of unacknowledged GCIs was empty, the surviving nodes made the incorrect assumption that this list would never be empty. (Bug #30509416)
When executing a copying ALTER
TABLE
of the parent table for a foreign key and the
SQL node terminates prior to completion, there remained an
extraneous temporary table with (additional, temporary) foreign
keys on all child tables. One consequence of this issue was that
it was not possible to restore a backup made using
mysqldump
--no-data
.
To fix this, NDB
now performs cleanup of
temporary tables whenever a mysqld process
connects (or reconnects) to the cluster.
(Bug #24935788, Bug #29892252)
An unplanned data node shutdown occurred following a bus error
on Mac OS X for ARM. We fix this by moving the call to
NdbCondition_Signal()
(in
AsyncIoThread.cpp
) such that it executes
prior to NdbMutex_Unlock()
—that is,
into the mutex, so that the condition being signalled is not
lost during execution.
(Bug #105522, Bug #33559219)
In DblqhMain.cpp
, a missing return in the
internal execSCAN_FRAGREQ()
function led to
an unplanned shutdown of the data node when inserting a nonfatal
error. In addition, the condition
!seize_op_rec(tcConnectptr)
present in the
same function was never actually checked.
(Bug #105051, Bug #33401830, Bug #33671869)
It was possible to set any of
MaxNoOfFiredTriggers
,
MaxNoOfLocalScans
, and
MaxNoOfLocalOperations
concurrently with
TransactionMemory
,
although this is not allowed.
In addition, it was not possible to set any of
MaxNoOfConcurrentTransactions
,
MaxNoOfConcurrentOperations
,
or
MaxNoOfConcurrentScans
concurrently with TransactionMemory
, although
there is no reason to prevent this.
In both cases, the concurrent settings behavior now matches the
documentation for the TransactionMemory
parameter.
(Bug #102509, Bug #32474988)
When a redo log part is unable to accept an operation's log entry immediately, the operation (a prepare, commit, or abort) is queued, or (prepare only) optionally aborted. By default operations are queued.
This mechanism was modified in 8.0.23 as part of decoupling
local data managers and redo log parts, and introduced a
regression whereby it was possible for queued operations to
remain in the queued state until all activity on the log part
quiesced. When this occurred, operations could remain queued
until DBTC
declared them
timed out, and aborted them.
(Bug #102502, Bug #32478380)