MySQL NDB Cluster 8.0 Release Notes
Important Change:
As part of the terminology changes begun in MySQL 8.0.21 and NDB
8.0.21, the
ndb_slave_conflict_role
system
variable is now deprecated, and is being replaced with
ndb_conflict_role
.
In addition, a number of status variables have been deprecated and are being replaced, as shown in the following table:
Table 2 Deprecated NDB status variables and their replacements
Also as part of this work, the
ndbinfo.table_distribution_status
table's tab_copy_status
column values
ADD_TABLE_MASTER
and
ADD_TABLE_SLAVE
are deprecated, and replaced
by, respectively, ADD_TABLE_COORDINATOR
and
ADD_TABLE_PARTICIPANT
.
Finally, the --help
output of some NDB utility
programs such as ndb_restore has been
updated.
(Bug #31571031)
NDB Client Programs: Effective with this release, the MySQL NDB Cluster Auto-Installer (ndb_setup.py) has been has been removed from the NDB Cluster binary and source distributions, and is no longer supported. (Bug #32084831)
References: See also: Bug #31888835.
ndbmemcache:
ndbmemcache
, which was deprecated in the
previous release of NDB Cluster, has now been removed from NDB
Cluster, and is no longer supported.
(Bug #32106576)
As part of work previously done in NDB 8.0, the metadata check
performed as part of auto-synchronization between the
representation of an NDB
table in
the NDB dictionary and its counterpart in the MySQL data
dictionary has been extended to include, in addition to
table-level properties, the properties of columns, indexes, and
foreign keys. (This check is also made by a debug MySQL server
when performing a CREATE TABLE
statement, and when opening an NDB
table.)
As part of this work, any mismatches found between an object's properties in the NDB dictionary and the MySQL data dictionary are now written to the MySQL error log. The error log message includes the name of the property, its value in the NDB dictionary, and its value in the MySQL data dictionary. If the object is a column, index, or foreign key, the object's type is also indicated in the message. (WL #13412)
The ThreadConfig
parameter has been extended with two new thread types, query
threads and recovery threads, intended for scaleout of LDM
threads. The number of query threads must be a multiple of the
number of LDM threads, up to a maximum of 3 times the number of
LDM threads.
It is also now possible when setting
ThreadConfig
to
combine the main
and rep
threads into a single thread by setting either or both of these
arguments to 0.
When one of these arguments is set to 0 but the other remains
set to 1, the resulting combined thread is named
main_rep
. When both are set to 0, they are
combined with the recv
thread (assuming that
recv
to 1), and this combined thread is named
main_rep_recv
. These thread names are those
shown when checking the
threads
table in the
ndbinfo
information database.
In addition, the maximums for a number of existing thread types have been increased. The new maximums are: LDM threads: 332; TC threads: 128; receive threads: 64; send threads: 64; main threads: 2. (The maximums for query threads and recovery threads are 332 each.) Maximums for other thread types remain unchanged from previous NDB Cluster releases.
Another change related to this work causes
NDB
to employ mutexes for protecting job
buffers when more than 32 block threads are in use. This may
cause a slight decrease in performance (roughly 1 to 2 percent),
but also results in a decrease in the amount of memory used by
very large configurations. For example, a setup with 64 threads
which used 2 GB of job buffer memory previously should now
require only about 1 GB instead. In our testing this has
resulted in an overall improvement (on the order of 5 percent)
in the execution of very complex queries.
For more information, see the descriptions of the arguments to
the ThreadConfig
parameter
discussed previously, and of the
ndbinfo.threads
table.
(WL #12532, WL #13219, WL #13338)
This release adds the possibility of configuring the threads for
multithreaded data nodes (ndbmtd)
automatically by implementing a new data node configuration
parameter
AutomaticThreadConfig
.
When set to 1, NDB
sets up the thread
assignments automatically, based on the number of processors
available to applications. If the system does not limit the
number of processors, you can do this by setting
NumCPUs
to the desired
number. Automatic thread configuration makes it unnecessary to
set any values for
ThreadConfig
or
MaxNoOfExecutionThreads
in config.ini
; if
AutomaticThreadConfig
is enabled, settings
for either of these parameters are not used.
As part of this work, a set of tables providing information
about hardware and CPU availability and usage by NDB data nodes
have been added to the ndbinfo
information database. These tables, along with a brief
description of the information provided by each, are listed
here:
cpudata
: CPU usage during
the past second
cpudata_1sec
: CPU usage
per second over the past 20 seconds
cpudata_20sec
: CPU usage
per 20-second interval over the past 400 seconds
cpudata_50ms
: CPU usage
per 50-millisecond interval during the past second
cpuinfo
: The CPU on which
the data node executes
hwinfo
: The hardware on
the host where the data node resides
Not all of the tables listed are available on all platforms supported by NDB Cluster:
The cpudata
,
cpudata_1sec
,
cpudata_20sec
, and
cpudata_50ms
tables are available only on
Linux and Solaris operating systems.
The cpuinfo
table is not available on
FreeBSD or macOS.
(WL #13980)
Added statistical information in the
DBLQH
block which is
employed to track the use of key lookups and scans, as well as
tracking queries from DBTC
and DBSPJ
. By detecting
situations in which the load is high, but in which there is not
actually any need to decrease the number of rows scanned per
realtime break, rather than checking the size of job buffer
queues to decide how many rows to scan, this makes it possible
to scan more rows when there is no CPU congestion. This helps
improve performance and realtime behaviour when handling high
loads.
(WL #14081)
A new method for handling table partitions and fragments is
introduced, such that the number of local data managers (LDMs)
for a given data node can determined independently of the number
of redo log parts, and that the number of LDMs can now be highly
variable. NDB
employs this method when the
ClassicFragmentation
data node configuration parameter, implemented as part of this
work, is set to false
. When this is done, the
number of LDMs is no longer used to determine how many
partitions to create for a table per data node; instead, the
PartitionsPerNode
parameter, also introduced in this release, now determines this
number, which is now used for calculating how many fragments a
table should have.
When ClassicFragmentation
has its default
value true
, then the traditional method of
using the number of LDMs is used to determine how many fragments
a table should have.
For more information, see Multi-Threading Configuration Parameters (ndbmtd). (WL #13930, WL #14107)
macOS:
Removed a number of compiler warnings which occurred when
building NDB
for Mac OS X.
(Bug #31726693)
Microsoft Windows:
Removed a compiler warning C4146: unary minus
operator applied to unsigned type, result still
unsigned from Visual Studio 2013 found in
storage\ndb\src\kernel\blocks\dbacc\dbaccmain.cpp
.
(Bug #23130016)
Solaris:
Due to a source-level error, atomic_swap_32()
was supposed to be specified but was not actually used for
Solaris builds of NDB Cluster.
(Bug #31765608)
NDB Cluster APIs:
Removed redundant usage of strlen()
in the
implementation of NdbDictionary
and related internal classes in the NDB API.
(Bug #100936, Bug #31930362)
MySQL NDB ClusterJ:
When a DomainTypeHandler
was instantiated by
a SessionFactory
, it was stored locally in a
static map, typeToHandlerMap
. If multiple,
distinct SessionFactories
for separate
connections to the data nodes were obtained by a ClusterJ
application, the static typeToHandlerMap
would be shared by all those factories. When one of the
SessionFactories
was closed, the connections
it created were closed and any tables opened by the connections
were cleared from the NDB API global cache. However, the
typeToHandlerMap
was not cleared, and through
it the other SessionFactories
keep accessing
the DomainTypeHandlers
of tables that had
already been cleared. These obsolete
DomainTypeHandlers
contained invalid
NdbTable
references and any
ndbapi
calls using those table references
ended up with errors.
This patch fixes the issue by making the
typeToHandlerMap
and the related
proxyInterfacesToDomainClassMap
maps local to
a SessionFactory
, so that they are cleared
when the SessionFactory
is closed.
(Bug #31710047)
MySQL NDB ClusterJ:
Setting
com.mysql.clusterj.connection.pool.size=0
made connections to an NDB Cluster fail. With this fix, setting
com.mysql.clusterj.connection.pool.size=0
disables connection pooling as expected, so that every request
for a SessionFactory
results in the creation
of a new factory and separate connections to the cluster can be
created using the same connection string.
(Bug #21370745, Bug #31721416)
When calling disk_page_abort_prealloc()
, the
callback from this internal function is ignored, and so removal
of the operation record for the LQHKEYREQ
signal proceeds without waiting. This left the table subject to
removal before the callback had completed, leading to a failure
in PGMAN
when the page was
retrieved from disk.
To avoid this, we add an extra usage count for the table especially for this page cache miss; this count is decremented as soon as the page cache miss returns. This means that we guarantee that the table is still present when returning from the disk read. (Bug #32146931)
When a table was created, it was possible for a fragment of the
table to be checkpointed too early during the next local
checkpoint. This meant that Prepare Phase LCP writes were still
being performed when the LCP completed, which could lead to
problems with subsequent ALTER
TABLE
statements on the table just created. Now we
wait for any potential Prepare Phase LCP writes to finish before
the LCP is considered complete.
(Bug #32130918)
Using the maximum size of an index key supported by index statistics (3056 bytes) caused buffer issues in data nodes. (Bug #32094904)
References: See also: Bug #25038373.
NDB
now prefers
CLOCK_MONOTONIC
which on Linux is adjusted by
frequency changes but is not updated during suspend. On macOS,
NDB
instead uses
CLOCK_UPTIME_RAW
which is the same, except
that it is not affected by any adjustments.
In addition, when intializing NdbCondition
the monotonic clock to use is taken directly from
NdbTick
, rather than re-executing the same
preprocessor logic used by NdbTick
.
(Bug #32073826)
ndb_restore terminated unexpectedly when run
with the --decrypt
option on
big-endian systems.
(Bug #32068854)
When the data node receive thread found that the job buffer was too full to receive, nothing was done to ensure that, the next time it checked, it resumed receiving from the transporter at the same point at which it stopped previously. (Bug #32046097)
The metadata check failed during auto-synchronization of tables restored using the ndb_restore tool. This was a timing issue relating to indexes, and was found in the following two scenarios encountered when a table had been selected for auto-synchronization:
When the indexes had not yet been created in the NDB dictionary
When the indexes had been created, but were not yet usable
(Bug #32004637)
Optimized sending of packed signals by registering the kernel blocks affected and the sending functions which need to be called for each one in a data structure rather than looking up this information each time. (Bug #31936941)
When two data definition language statements—one on a database and another on a table in the same schema—were run in parallel, it was possible for a deadlock to occur. The DDL statement affecting the database acquired the global schema lock first, but before it could acquire a metadata lock on the database, the statement affecting the table acquired an intention-exclusive metadata lock on the schema. The table DDL statement was thus waiting for the global schema lock to upgrade its metadata lock on the table to an exclusive lock, while the database DDL statement waited for an exclusive metadata lock on the database, leading to a deadlock.
A similar type of deadlock involving tablespaces and tables was already known to occur; NDB already detected and resolved that issue. The current fix extends that logic to handle databases and tables as well, to resolve the problem. (Bug #31875229)
Clang 8 raised a warning due to an uninitialized variable. (Bug #31864792)
An empty page acquired for an insert did not receive a log sequence number. This is necessary in case the page was used previously and thus required undo log execution before being used again. (Bug #31859717)
No reason was provided when rejecting an attempt to perform an
in-place
ALTER
TABLE ... ADD PARTITION
statement on a fully
replicated table.
(Bug #31809290)
When the master node had recorded a more recent GCI than a node starting up which had performed an unsuccessful restart, subsequent restarts of the latter could not be performed because it could not restore the stated GCI. (Bug #31804713)
When using 3 or 4 fragment replicas, it is possible to add more
than one node at a time, which means that
DBLQH
and
DBDIH
can have distribution
keys based on numbers of fragment replicas that differ by up to
3 (that is, MAX_REPLICAS
- 1), rather than by
only 1.
(Bug #31784934)
It was possible in DBLQH
for
an ABORT
signal to arrive from
DBTC
before it received an
LQHKEYREF
signal from the next local query
handler. Now in such cases, the out-of-order
ABORT
signal is ignored.
(Bug #31782578)
NDB
did not handle correctly the case when an
ALTER TABLE ...
COMMENT="..."
statement did not specify
ALGORITHM=COPY
.
(Bug #31776392)
It was possible in some cases to miss the end point of undo logging for a fragment. (Bug #31774459)
ndb_print_sys_file did not work correctly
with version 2 of the sysfile
format that
was introduced in NDB 8.0.18.
(Bug #31726653)
References: See also: Bug #31828452.
DBLQH
could not handle the
case in which identical operation records having the same
transaction ID came from different transaction coordinators.
This led to locked rows persisting after a node failure, which
kept node recovery from completing.
(Bug #31726568)
It is possible for DBDIH
to
receive a local checkpoint having a given ID to restore while a
later LCP is actually used instead, but when performing a
partial LCP in such cases, the DIH
block was
not fully synchronized with the ID of the LCP used.
(Bug #31726514)
In most cases, when searching a hash index, the row is used to read the primary key, but when the row has not yet been committed the primary key may be read from the copy row. If the row has been deleted, it can no longer be used to read the primary key. Previously in such cases, the primary key was treated as a NULL, but this could lead to making a comparison using uninitialised data.
Now when this occurs, the comparison is made only if the row has not been deleted; otherwise the row is checked of among the operations in the serial queue. If no operation has the primary key, then any comparison can be reported as not equal, since no entry in the parallel queue can reinsert the row. This needs to be checked due to the fact that, if an entry in the serial queue is an insert then the primary key from this operation must be identified as such to preclude inserting the same primary key twice. (Bug #31688797)
As with writing redo log records, when the file currently used for writing global checkpoint records becomes full, writing switches to the next file. This switch is not supposed to occur until the new file is actually ready to receive the records, but no check was made to ensure that this was the case. This could lead to an unplanned data node shutdown restoring data from a backup using ndb_restore. (Bug #31585833)
Release of shared global memory when it is no longer required by
the DBSPJ
block now occurs
more quickly than previously.
(Bug #31321518)
References: See also: Bug #31231286.
Stopping 3 nodes out of 4 in a single node group using
kill -9 caused an unplanned
cluster shutdown. To keep this from happening under such
conditions, NDB
now ensures that any node
group that has not had any node failures is viewed by
arbitration checks as fully viable.
(Bug #31245543)
Multi-threaded index builds could sometimes attempt to use an internal function disallowed to them. (Bug #30587462)
While adding new data nodes to the cluster, and while the management node was restarting with an updated configuration file, some data nodes terminated unexpectedly with the error virtual void TCP_Transporter::resetBuffers(): Assertion `!isConnected()' failed. (Bug #30088051)
It was not possible to execute TRUNCATE
TABLE
or DROP TABLE
for
the parent table of a foreign key with
foreign_key_checks
set to 0.
(Bug #97501, Bug #30509759)
Optimized the internal
NdbReceiver::unpackNdbRecord()
method, which
is used to convert rows retrieved from the data nodes from
packed wire format to the NDB API row format. Prior to the
change, roughly 13% of CPU usage for executing a join occurred
within this method; this was reduced to approximately 8%.
(Bug #95007, Bug #29640755)