MySQL NDB Cluster 8.0 Release Notes
Important Change:
The default value for the
ndb_autoincrement_prefetch_sz
server system variable has been increased to 512.
(Bug #30316314)
Important Change:
NDB
now supports more than 2
fragment replicas (up to a maximum of 4). Setting
NoOfReplicas=3
or
NoOfReplicas=4
is now fully covered in our
internal testing and thus supported for use in production.
(Bug #97479, Bug #97579, Bug #25261716, Bug #30501414, Bug #30528105, WL #8426)
Important Change:
Added the
TransactionMemory
data
node configuration parameter which simplifies configuration of
data node memory allocation for transaction operations. This is
part of ongoing work on pooling of transactional and Local Data
Manager (LDM) memory.
The following parameters are incompatible with
TransactionMemory
and cannot be set in the
config.ini
configuration file if this
parameter has been set:
If you attempt to set any of these incompatible parameters
concurrently with TransactionMemory
, the
cluster management server cannot start.
For more information, see the description of the
TransactionMemory
parameter and
Parameters incompatible with TransactionMemory.
See also
Data Node Memory Management, for
information about how memory resources are allocated by NDB
Cluster data nodes.
(Bug #96995, Bug #30344471, WL #12687)
Important Change: The maximum or default values for several NDB Cluster data node configuration parameters have been changed in this release. These changes are listed here:
The maximum value for
DataMemory
is
increased from 1 terabyte to 16 TB.
The maximum value for
DiskPageBufferMemory
is also increased from 1 TB to 16 TB.
The default value for
StringMemory
is
decreased to 5 percent. Previously, this was 25 percent.
The default value for
LcpScanProgressTimeout
is increased from 60 seconds to 180 seconds.
(WL #13382)
Performance:
Read from any fragment replica, which greatly improves the
performance of table reads at a very low cost to table write
performance, is now enabled by default for all
NDB
tables. This means both that
the default value for the
ndb_read_backup
system variable
is now ON, and that the value of the
NDB_TABLE
comment option
READ_BACKUP
is 1 when creating a new
NDB
table. (Previously, the default values
were OFF and 0, respectively.)
For more information, see
Setting NDB Comment Options, as well as
the description of the
ndb_read_backup
system
variable.
(WL #13383)
NDB Disk Data: The latency of checkpoints for Disk Data files has been reduced when using non-volatile memory devices such as solid-state drives (especially those using NVMe for data transfer), separate physical drives for Disk Data files, or both. As part of this work, two new data node configuration parameters, listed here, have been introduced:
MaxDiskDataLatency
sets a maximum on allowed latency for disk access, aborting
transactions exceeding this amount of time to complete
DiskDataUsingSameDisk
makes it possible to take advantage of keeping Disk Data
files on separate disks by increasing the rate at which Disk
Data checkpoints can be made
This release also adds three new tables to the
ndbinfo
database. These
tables, listed here, can assist with performance monitoring of
Disk Data checkpointing:
diskstat
provides
information about Disk Data tablespace reads, writes, and
page requests during the previous 1 second
diskstats_1sec
provides
information similar to that given by the
diskstat
table, but does so for each of
the last 20 seconds
pgman_time_track_stats
table reports on the latency of disk operations affecting
Disk Data tablespaces
For additional information, see Disk Data latency parameters. (WL #12924)
Added the ndb_metadata_sync
server system variable, which simplifies knowing when metadata
synchronization has completed successfully. Setting this
variable to true
triggers immediate
synchronization of all changes between the
NDB
dictionary and the MySQL data
dictionary without regard to any values set for
ndb_metadata_check
or
ndb_metadata_check_interval
.
When synchronization has completed, its value is automatically
reset to false
.
(Bug #30406657)
Added the DedicatedNode
parameter for data
nodes, API nodes, and management nodes. When set to true, this
parameter prevents the management server from handing out this
node's node ID to any node that does not request it
specifically. Intended primarily for testing, this parameter may
be useful in cases in which multiple management servers are
running on the same host, and using the host name alone is not
sufficient for distinguishing among processes of the same type.
(Bug #91406, Bug #28239197)
A stack trace is now written to the data node log on abnormal termination of a data node. (WL #13166)
Automatic synchronization of metadata from the MySQL data
dictionary to NDB
now includes
databases containing NDB
tables. With this
enhancement, if a table exists in NDB
, and
the table and the database it belongs to do not exist on a given
SQL node, it is no longer necessary to create the database
manually. Instead, the database, along with all
NDB
tables belonging to this database, should
be created on the SQL node automatically.
(WL #13490)
Incompatible Change:
ndb_restore no longer restores shared users
and grants to the mysql.ndb_sql_metadata
table by default. A new command-line option
--include-stored-grants
is
added to override this behavior and enable restoring of shared
user and grant data and metadata.
As part of this fix, ndb_restore can now also correctly handle an ordered index on a system table. (Bug #30237657)
References: See also: Bug #29534239, Bug #30459246.
Incompatible Change:
The minimum value for the
RedoOverCommitCounter
data node configuration parameter has been increased from 0 to
1. The minimum value for the
RedoOverCommitLimit
data
node configuration parameter has also been increased from 0 to
1.
You should check the cluster global configuration file and make any necessary adjustments to values set for these parameters before upgrading. (Bug #29752703)
macOS:
On macOS, SQL nodes sometimes shut down unexpectedly during the
binary log setup phase when starting the cluster. This occurred
when there existed schemas whose names used uppercase letters
and lower_case_table_names
was
set to 2. This caused acquisition of metadata locks to be
attempted using keys having the incorrect lettercase, and,
subsequently, these locks to fail.
(Bug #30192373)
Microsoft Windows; NDB Disk Data:
On Windows, restarting a data node other than the master when
using Disk Data tables led to a failure in
TSMAN
.
(Bug #97436, Bug #30484272)
Solaris: When debugging, ndbmtd consumed all available swap space on Solaris 11.4 SRU 12 and later. (Bug #30446577)
Solaris:
The byte order used for numeric values stored in the
mysql.ndb_sql_metadata
table was incorrect on
Solaris/Sparc. This could be seen when using
ndb_select_all or
ndb_restore
--print
.
(Bug #30265016)
NDB Disk Data:
After dropping a disk data table on one SQL node, trying to
execute a query against
INFORMATION_SCHEMA.FILES
on a
different SQL node stalled at Waiting for tablespace
metadata lock
.
(Bug #30152258)
References: See also: Bug #29871406.
NDB Disk Data:
ALTER
TABLESPACE ... ADD DATAFILE
could sometimes hang while
trying to acquire a metadata lock.
(Bug #29871406)
NDB Disk Data: Compatibility code for the Version 1 disk format used prior to the introduction of the Version 2 format in NDB 7.6 turned out not to be necessary, and is no longer used.
Work done in NDB 8.0.18 to allow more nodes introduced long signal variants of several signals taking a bitmask as one of their arguments, and we started using these new long signal variants even if the previous (still supported) short variants would have been sufficient. This introduced several new opportunities for hitting out of LongMessageBuffer errors.
To avoid this, now in such cases we use the short signal
variants wherever possible. Some of the signals affected include
CM_REGCONF
, CM_REGREF
,
FAIL_REP
, NODE_FAILREP
,
ISOLATE_ORD
, COPY_GCIREQ
,
START_RECREQ
,
NDB_STARTCONF
, and
START_LCP_REQ
.
(Bug #30708009)
References: See also: Bug #30707970.
The fix made in NDB 8.0.18 for an issue in which a transaction
was committed prematurely aborted the transaction if the table
definition had changed midway, but failed in testing to free
memory allocated by
getExtraMetadata()
. Now
this memory is properly freed before aborting the transaction.
(Bug #30576983)
References: This issue is a regression of: Bug #29911440.
Excessive allocation of attribute buffer when initializing data
in DBTC
led to preallocation
of api connection records failing due to unexpectedly running
out of memory.
(Bug #30570264)
Improved error handling in the case where
NDB
attempted to update a local
user having the NDB_STORED_USER
privilege but which could not be found in the
ndb_sql_metadata
table.
(Bug #30556487)
Failure of a transaction during execution of an
ALTER TABLE ...
ALGORITHM=COPY
statement following the rename of the
new table to the name of the original table but before dropping
the original table caused mysqld to exit
prematurely.
(Bug #30548209)
Non-MSI builds on Windows using
-DWITH_NDBCLUSTER
did not succeed
unless the WiX toolkit was installed.
(Bug #30536837)
The allowed_values
output from
ndb_config
--xml
--configinfo
for the
Arbitration
data node
configuration parameter in NDB 8.0.18 was not consistent with
that obtained in previous releases.
(Bug #30529220)
References: See also: Bug #30505003.
A faulty ndbrequire()
introduced when
implementing partial local checkpoints assumed that
m_participatingLQH
must be clear when
receiving START_LCP_REQ
, which is not
necessarily true when a failure happens for the master after
sending START_LCP_REQ
and before handling any
START_LCP_CONF
signals.
(Bug #30523457)
A local checkpoint sometimes hung when the master node failed
while sending an LCP_COMPLETE_REP
signal and
it was sent to some nodes, but not all of them.
(Bug #30520818)
The management server did not handle all cases of
NODE_FAILREP
correctly.
(Bug #30520066)
With SharedGlobalMemory
set to 0, some resources did not meet required minimums.
(Bug #30411835)
Execution of ndb_restore
--rebuild-indexes
together
with the --rewrite-database
and --exclude-missing-tables
options did not create indexes for any tables in the target
database.
(Bug #30411122)
When writing the schema operation into the
ndb_schema
table failed, the states in the
NDB_SCHEMA
object were not cleared, which led
to the SQL node shutting down when it tried to free the object.
(Bug #30402362)
References: See also: Bug #30371590.
When synchronizing extent pages it was possible for the current
local checkpoint (LCP) to stall indefinitely if a
CONTINUEB
signal for handling the LCP was
still outstanding when receiving the
FSWRITECONF
signal for the last page written
in the extent synchronization page. The LCP could also be
restarted if another page was written from the data pages. It
was also possible that this issue caused
PREP_LCP
pages to be written at times when
they should not have been.
(Bug #30397083)
If a transaction was aborted while getting a page from the disk page buffer and the disk system was overloaded, the transaction hung indefinitely. This could also cause restarts to hang and node failure handling to fail. (Bug #30397083, Bug #30360681)
References: See also: Bug #30152258.
Data node failures with the error Another node failed during system restart... occurred during a partial restart. (Bug #30368622)
Automatic synchronization could potentially trigger an increase in the number of locks being taken on a particular metadata object at a given time, such as when a synchronization attempt coincided with a DDL or DML statement involving the same metadata object; competing locks could lead to the NDB deadlock detection logic penalizing the user action rather than the background synchronization. We fix this by changing all exclusive metadata lock acquisition attempts during auto-synchronization so that they use a timeout of 0 (rather than the 10 seconds previously allowed), which avoids deadlock detection and gives priority to the user action. (Bug #30358470)
If a SYNC_EXTENT_PAGES_REQ
signal was
received by PGMAN
while
dropping a log file group as part of a partial local checkpoint,
and thus dropping the page locked by this block for processing
next, the LCP terminated due to trying to access the page after
it had already been dropped.
(Bug #30305315)
The wrong number of bytes was reported in the cluster log for a completed local checkpoint. (Bug #30274618)
References: See also: Bug #29942998.
Added the new ndb_mgm client debugging
commands DUMP 2356
and
DUMP 2357
.
(Bug #30265415)
Executing ndb_drop_table using the
--help
option caused this
program to terminate prematurely, and without producing any help
output.
(Bug #30259264)
A mysqld trying to connect to the cluster,
and thus trying to acquire the global schema lock (GSL) during
setup, ignored the setting for
ndb-wait-setup
and hung
indefinitely when the GSL had already been acquired by another
mysqld, such as when it was executing an
ALTER TABLE
statement.
(Bug #30242141)
When a table containing self-referential foreign key (in other
words, a foreign key referencing another column of the same
table) was altered using the COPY
algorithm,
the foreign key definition was removed.
(Bug #30233405)
In MySQL 8.0, names of foreign keys explicitly provided by user
are generated automatically in the SQL layer and stored in the
data dictionary. Such names are of the form
[
which align with the names generated by the
table_name
]_ibfk_[#
]InnoDB
storage engine in MySQL 5.7.
NDB 8.0.18 introduced a change in behavior by
NDB
such that it also uses the
generated names, but in some cases, such as when tables were
renamed, NDB
still generated and used its own
format for such names internally rather than those generated by
the SQL layer and stored in the data dictionary, which led to
the following issues:
Discrepancies in SHOW CREATE
TABLE
output and the contents of
INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS
Improper metadata locking for foreign keys
Confusing names for foreign keys in error messages
Now NDB
also renames the foreign keys in such
cases, using the names provided by the MySQL server, to align
fully with those used by InnoDB
.
(Bug #30210839)
References: See also: Bug #96508, Bug #30171959.
When a table referenced by a foreign key was renamed, participating SQL nodes did not properly update the foreign key definitions for the referencing table in their data dictionaries during schema distribution. (Bug #30191068)
Data node handling of failures of other data nodes could sometimes not be synchronized properly, such that two or more data nodes could see different nodes as the master node. (Bug #30188414)
Some scan operations failed due to the presence of an old assert
in DbtupBuffer.cpp
that checked whether API
nodes were using a version of the software previous to NDB 6.4.
This was no longer necessary or correct, and has been removed.
(Bug #30188411)
When executing a global schema lock (GSL),
NDB
used a single
Ndb_table_guard
object for successive retires
when attempting to obtain a table object reference; it was not
possible for this to succeed after failing on the first attempt,
since Ndb_table_guard
assumes that the
underlying object pointer is determined once only—at
initialisation—with the previously retrieved pointer being
returned from a cached reference thereafter.
This resulted in infinite waits to obtain the GSL, causing the
binlog injector thread to hang so that mysqld
considered all NDB
tables to be read-only. To
avoid this problem, NDB
now uses a fresh
instance of Ndb_table_guard
for each such
retry.
(Bug #30120858)
References: This issue is a regression of: Bug #30086352.
When upgrading an SQL node to NDB 8.0 from a previous release
series, the .frm
file whose contents are
read and then installed in the data dictionary does not contain
any information about foreign keys. This meant that foreign key
information was not installed in the SQL node's data dictionary.
This is fixed by using the foreign key information available in
the NDB data dictionary to update the local MySQL data
dictionary during table metadata upgrade.
(Bug #30071043)
Restoring tables with the
--disable-indexes
option
resulted in the wrong table definition being installed in the
MySQL data dictionary. This is because the serialized dictionary
information (SDI) packed into the NDB dictionary's table
definition is used to create the table object; the SDI
definition is updated only when the DDL change is done through
the MySQL server. Installation of the wrong table definition
meant that the table could not be opened until the indexes were
re-created in the NDB dictionary again using
--rebuild-indexes
.
This is fixed by extending auto-synchronization such that it compares the SDI to the NDB dictionary table information and fails in cases in which the column definitions do not match. Mismatches involving indexes only are treated as temporary errors, with the table in question being detected again during the next round of change detection. (Bug #30000202, Bug #30414514)
Restoring tables for which MAX_ROWS
was used
to alter partitioning from a backup made from NDB 7.4 to a
cluster running NDB 7.6 did not work correctly. This is fixed by
ensuring that the upgrade code handling
PartitionBalance
supplies a valid table
specification to the NDB
dictionary.
(Bug #29955656)
The number of data bytes for the summary event written in the cluster log when a backup completed was truncated to 32 bits, so that there was a significant mismatch between the number of log records and the number of data records printed in the log for this event. (Bug #29942998)
mysqld sometimes aborted during a long
ALTER TABLE
operation that timed
out.
(Bug #29894768)
References: See also: Bug #29192097.
When an SQL node connected to NDB
, it did not
know whether it had previously connected to that cluster, and
thus could not determine whether its data dictionary information
was merely out of date, or completely invalid. This issue is
solved by implementing a unique schema version identifier
(schema UUID) to the ndb_schema
table in
NDB
as well as to the
ndb_schema
table object in the data
dictionary. Now, whenever a mysqld connects
to a cluster as an SQL node, it can compare the schema UUID
stored in its data dictionary against that which is stored in
the ndb_schema
table, and so know whether it
is connecting for the first time. If so, the SQL node removes
any entries that may be in its data dictionary.
(Bug #29894166)
References: See also: Bug #27543602.
Improved log messages generated by table discovery and table metadata upgrades. (Bug #29894127)
Using 2 LDM threads on a 2-node cluster with 10 threads per node could result in a partition imbalance, such that one of the LDM threads on each node was the primary for zero fragments. Trying to restore a multi-threaded backup from this cluster failed because the datafile for one LDM contained only the 12-byte data file header, which ndb_restore was unable to read. The same problem could occur in other cases, such as when taking a backup immediately after adding an empty node online.
It was found that this occurred when
ODirect
was enabled for
an EOF backup data file write whose size was less than 512 bytes
and the backup was in the STOPPING
state.
This normally occurs only for an aborted backup, but could also
happen for a successful backup for which an LDM had no
fragments. We fix the issue by introducing an additional check
to ensure that writes are skipped only if the backup actually
contains an error which should cause it to abort.
(Bug #29892660)
References: See also: Bug #30371389.
For NDB
tables,
ALTER TABLE ...
ALTER INDEX
did not work with
ALGORITHM=INPLACE
.
(Bug #29700197)
ndb_restore failed in testing on 32-bit platforms. This issue is fixed by increasing the size of the thread stack used by this tool from 64 KB to 128 KB. (Bug #29699887)
References: See also: Bug #30406046.
An unplanned shutdown of the cluster occurred due to an error in
DBTUP
while deleting rows
from a table following an online upgrade.
(Bug #29616383)
In some cases the SignalSender
class, used as
part of the implementation of ndb_mgmd and
ndbinfo
, buffered excessive
numbers of unneeded SUB_GCP_COMPLETE_REP
and
API_REGCONF
signals, leading to unnecessary
consumption of memory.
(Bug #29520353)
References: See also: Bug #20075747, Bug #29474136.
The setting for the
BackupLogBufferSize
configuration parameter was not honored.
(Bug #29415012)
When mysqld was run with the
--upgrade=FORCE
option, it
reported the following issues:
[Warning] Table 'mysql.ndb_apply_status' requires repair. [ERROR] Table 'mysql.ndb_apply_status' repair failed.
This was because --upgrade=FORCE
causes a
bootstrap system thread to run
CHECK TABLE FOR
UPGRADE
, but ha_ndbcluster::open()
refused to open the table before schema synchronization had
completed, which eventually led to the reported conditions.
(Bug #29305977)
References: See also: Bug #29205142.
When using explicit SHM connections, with
ShmSize
set to a value
larger than the system's available shared memory,
mysqld hung indefinitely on startup and
produced no useful error messages.
(Bug #28875553)
The maximum global checkpoint (GCP) commit lag and GCP save timeout are recalculated whenever a node shuts down, to take into account the change in number of data nodes. This could lead to the unintentional shutdown of a viable node when the threshold decreased below the previous value. (Bug #27664092)
References: See also: Bug #26364729.
A transaction which inserts a child row may run concurrently with a transaction which deletes the parent row for that child. One of the transactions should be aborted in this case, lest an orphaned child row result.
Before committing an insert on a child row, a read of the parent
row is triggered to confirm that the parent exists. Similarly,
before committing a delete on a parent row, a read or scan is
performed to confirm that no child rows exist. When insert and
delete transactions were run concurrently, their prepare and
commit operations could interact in such a way that both
transactions committed. This occurred because the triggered
reads were performed using LM_CommittedRead
locks (see
NdbOperation::LockMode
), which
are not strong enough to prevent such error scenarios.
This problem is fixed by using the stronger
LM_SimpleRead
lock mode for both triggered
reads. The use of LM_SimpleRead
rather than
LM_CommittedRead
locks ensures that at least
one transaction aborts in every possible scenario involving
transactions which concurrently insert into child rows and
delete from parent rows.
(Bug #22180583)
Concurrent SELECT
and
ALTER TABLE
statements on the
same SQL node could sometimes block one another while waiting
for locks to be released.
(Bug #17812505, Bug #30383887)
Failure handling in schema synchronization involves pushing warnings and errors to the binary logging thread. Schema synchronization is also retried in case of certain failures which could lead to an accumulation of warnings in the thread. Now such warnings and errors are cleared following each attempt at schema synchronization. (Bug #2991036)
An INCL_NODECONF
signal from any local blocks
should be ignored when a node has failed, except in order to
reset c_nodeStartSlave.nodeId
.
(Bug #96550, Bug #30187779)
When returning Error 1022, NDB
did not print
the name of the affected table.
(Bug #74218, Bug #19763093)
References: See also: Bug #29700174.