MySQL NDB Cluster 7.6 Release Notes
MySQL NDB Cluster 7.6.4 is a new release of NDB 7.6, based on
MySQL Server 5.7 and including features in version 7.6 of the
NDB
storage engine, as well as fixing
recently discovered bugs in previous NDB Cluster releases.
Obtaining NDB Cluster 7.6. NDB Cluster 7.6 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.
For an overview of changes made in NDB Cluster 7.6, see What is New in NDB Cluster 7.6.
This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 5.7 through MySQL 5.7.20 (see Changes in MySQL 5.7.20 (2017-10-16, General Availability)).
Incompatible Change; NDB Disk Data:
Due to changes in disk file formats, it is necessary to perform
an --initial
restart of each data
node when upgrading to or downgrading from this release.
Important Change; NDB Disk Data: NDB Cluster has improved node restart times and overall performance with larger data sets by implementing partial local checkpoints (LCPs). Prior to this release, an LCP always made a copy of the entire database.
NDB
now supports LCPs that write individual
records, so it is no longer strictly necessary for an LCP to
write the entire database. Since, at recovery, it remains
necessary to restore the database fully, the strategy is to save
one fourth of all records at each LCP, as well as to write the
records that have changed since the last LCP.
Two data node configuration parameters relating to this change
are introduced in this release:
EnablePartialLcp
(default true
, or enabled) enables partial
LCPs. When partial LCPs are enabled,
RecoveryWork
controls
the percentage of space given over to LCPs; it increases with
the amount of work which must be performed on LCPs during
restarts as opposed to that performed during normal operations.
Raising this value causes LCPs during normal operations to
require writing fewer records and so decreases the usual
workload. Raising this value also means that restarts can take
longer.
Upgrading to NDB 7.6.4 or downgrading from this release
requires purging then re-creating the NDB
data node file system, which means that an initial restart of
each data node is needed. An initial node restart still
requires a complete LCP; a partial LCP is not used for this
purpose.
A rolling restart or system restart is a normal part of an
NDB
software upgrade. When such a restart
is performed as part of an upgrade to NDB 7.6.4 or later, any
existing LCP files are checked for the presence of the LCP
sysfile
, indicating that the existing data
node file system was written using NDB 7.6.4 or later. If such
a node file system exists, but does not contain the
sysfile
, and if any data nodes are
restarted without the --initial
option, NDB
causes the restart to fail with
an appropriate error message. This detection can be performed
only as part of an upgrade; it is not possible to do so as
part of a downgrade to NDB 7.6.3 or earlier from a later
release.
Exception: If there are no data node
files—that is, in the event of a “clean”
start or restart—using --initial
is not
required for a software upgrade, since this is already
equivalent to an initial restart. (This aspect of restarts is
unchanged from previous releases of NDB Cluster.)
In addition, the default value for
StartPartitionedTimeout
is changed from 60000 to 0.
This release also deprecates the data node configuration
parameters
BackupDataBufferSize
,
BackupWriteSize
, and
BackupMaxWriteSize
;
these are now subject to removal in a future NDB Cluster
version.
(Bug #27308632, WL #8069, WL #10302, WL #10993)
Important Change:
Added the ndb_perror utility for obtaining
information about NDB Cluster error codes. This tool replaces
perror --ndb
; the
--ndb
option for
perror is now deprecated and raises a warning
when used; the option is subject to removal in a future NDB
version.
See ndb_perror — Obtain NDB Error Message Information, for more information. (Bug #81703, Bug #81704, Bug #23523869, Bug #23523926)
References: See also: Bug #26966826, Bug #88086.
NDB Client Programs: NDB Cluster Auto-Installer node configuration parameters as supported in the UI and accompanying documentation were in some cases hard coded to an arbitrary value, or were missing altogether. Configuration parameters, their default values, and the documentation have been better aligned with those found in release versions of the NDB Cluster software.
One necessary addition to this task was implementing the
mechanism which the Auto-Installer now provides for setting
parameters that take discrete values. For example, the value of
the data node parameter
Arbitration
must now be
one of Default
, Disabled
,
or WaitExternal
.
The Auto-Installer also now gets and uses the amount of disk
space available to NDB
on each host
for deriving reasonable default values for configuration
parameters which depend on this value.
See The NDB Cluster Auto-Installer (NDB 7.5) (NO LONGER SUPPORTED), for more information. (WL #10340, WL #10408, WL #10449)
NDB Client Programs: Secure connection support in the MySQL NDB Cluster Auto-Installer has been updated or improved in this release as follows:
Added a mechanism for setting SSH membership on a per-host basis.
Updated the Paramiko Python module to the most recent available version (2.6.1).
Provided a place in the GUI for encrypted private key passwords, and discontinued use of hardcoded passwords.
Related enhancements implemented in the current release include the following:
Discontinued use of cookies as a persistent store for NDB Cluster configuration information; these were not secure and came with a hard upper limit on storage. Now the Auto-Installer uses an encrypted file for this purpose.
In order to secure data transfer between the web browser front end and the back end web server, the default communications protocol has been switched from HTTP to HTTPS.
See The NDB Cluster Auto-Installer (NDB 7.5) (NO LONGER SUPPORTED), for more information. (WL #10426, WL #11128, WL #11289)
MySQL NDB ClusterJ: ClusterJ now supports CPU binding for receive threads through the setRecvThreadCPUids() and getRecvThreadCPUids() methods. Also, the receive thread activation threshold can be set and get with the setRecvThreadActivationThreshold() and getRecvThreadActivationThreshold() methods. (WL #10815)
It is now possible to specify a set of cores to be used for I/O
threads performing offline multithreaded builds of ordered
indexes, as opposed to normal I/O duties such as file I/O,
compression, or decompression. “Offline” in this
context refers to building of ordered indexes performed when the
parent table is not being written to; such building takes place
when an NDB
cluster performs a node or system
restart, or as part of restoring a cluster from backup using
ndb_restore
--rebuild-indexes
.
In addition, the default behaviour for offline index build work is modified to use all cores available to ndbmtd, rather limiting itself to the core reserved for the I/O thread. Doing so can improve restart and restore times and performance, availability, and the user experience.
This enhancement is implemented as follows:
The default value for
BuildIndexThreads
is
changed from 0 to 128. This means that offline ordered index
builds are now multithreaded by default.
The default value for
TwoPassInitialNodeRestartCopy
is changed from false
to
true
. This means that an initial node
restart first copies all data from a “live”
node to one that is starting—without creating any
indexes—builds ordered indexes offline, and then again
synchronizes its data with the live node, that is,
synchronizing twice and building indexes offline between the
two synchonizations. This causes an initial node restart to
behave more like the normal restart of a node, and reduces
the time required for building indexes.
A new thread type (idxbld
) is defined for
the ThreadConfig
configuration parameter, to allow locking of offline index
build threads to specific CPUs.
In addition, NDB
now distinguishes the thread
types that are accessible to “ThreadConfig” by the
following two criteria:
Whether the thread is an execution thread. Threads of types
main
, ldm
,
recv
, rep
,
tc
, and send
are
execution threads; thread types io
,
watchdog
, and idxbld
are not.
Whether the allocation of the thread to a given task is
permanent or temporary. Currently all thread types except
idxbld
are permanent.
For additonal information, see the descriptions of the parameters in the Manual. (Bug #25835748, Bug #26928111)
Added the
ODirectSyncFlag
configuration parameter for data nodes. When enabled, the data
node treats all completed filesystem writes to the redo log as
though they had been performed using fsync
.
This parameter has no effect if at least one of the following conditions is true:
ODirect
is not
enabled.
InitFragmentLogFiles
is set to SPARSE
.
(Bug #25428560)
Added the
ndbinfo.error_messages
table,
which provides information about NDB Cluster errors, including
error codes, status types, brief descriptions, and
classifications. This makes it possible to obtain error
information using SQL in the mysql client (or
other MySQL client program), like this:
mysql> SELECT * FROM ndbinfo.error_messages WHERE error_code='321';
+------------+----------------------+-----------------+----------------------+
| error_code | error_description | error_status | error_classification |
+------------+----------------------+-----------------+----------------------+
| 321 | Invalid nodegroup id | Permanent error | Application error |
+------------+----------------------+-----------------+----------------------+
1 row in set (0.00 sec)
The query just shown provides equivalent information to that obtained by issuing ndb_perror 321 or (now deprecated) perror --ndb 321 on the command line. (Bug #86295, Bug #26048272)
ThreadConfig
now has
an additional nosend
parameter that can be
used to prevent a main
,
ldm
, rep
, or
tc
thread from assisting the send threads, by
setting this parameter to 1 for the given thread. By default,
nosend
is 0. It cannot be used with threads
other than those of the types just listed.
(WL #11554)
When executing a scan as a pushed join, all instances of
DBSPJ
were involved in the execution of a
single query; some of these received multiple requests from the
same query. This situation is improved by enabling a single SPJ
request to handle a set of root fragments to be scanned, such
that only a single SPJ request is sent to each
DBSPJ
instance on each node and batch sizes
are allocated per fragment, the multi-fragment scan can obtain a
larger total batch size, allowing for some scheduling
optimizations to be done within DBSPJ
, which
can scan a single fragment at a time (giving it the total batch
size allocation), scan all fragments in parallel using smaller
sub-batches, or some combination of the two.
Since the effect of this change is generally to require fewer SPJ requests and instances, performance of pushed-down joins should be improved in many cases. (WL #10234)
As part of work ongoing to optimize bulk DDL performance by ndbmtd, it is now possible to obtain performance improvements by increasing the batch size for the bulk data parts of DDL operations which process all of the data in a fragment or set of fragments using a scan. Batch sizes are now made configurable for unique index builds, foreign key builds, and online reorganization, by setting the respective data node configuration parameters listed here:
MaxFKBuildBatchSize
:
Maximum scan batch size used for building foreign keys.
MaxReorgBuildBatchSize
:
Maximum scan batch size used for reorganization of table
partitions.
MaxUIBuildBatchSize
:
Maximum scan batch size used for building unique keys.
For each of the parameters just listed, the default value is 64, the minimum is 16, and the maximum is 512.
Increasing the appropriate batch size or sizes can help amortize inter-thread and inter-node latencies and make use of more parallel resources (local and remote) to help scale DDL performance. (WL #11158)
Formerly, the data node LGMAN
kernel block
processed undo log records serially; now this is done in
parallel. The rep
thread, which hands off
undo records to local data handler (LDM) threads, waited for an
LDM to finish applying a record before fetching the next one;
now the rep
thread no longer waits, but
proceeds immediately to the next record and LDM.
There are no user-visible changes in functionality directly associated with this work; this performance enhancement is part of the work being done in NDB 7.6 to improve undo long handling for partial local checkpoints. (WL #8478)
When applying an undo log the table ID and fragment ID are
obtained from the page ID. This was done by reading the page
from PGMAN
using an extra
PGMAN
worker thread, but when applying the
undo log it was necessary to read the page again.
This became very inefficient when using
O_DIRECT
(see
ODirect
) since the page
was not cached in the OS kernel.
Mapping from page ID to table ID and fragment ID is now done
using information the extent header contains about the table IDs
and fragment IDs of the pages used in a given extent. Since the
extent pages are always present in the page cache, no extra disk
reads are required to perform the mapping, and the information
can be read using existing TSMAN
data
structures.
(WL #10194)
Added the NODELOG DEBUG
command in the ndb_mgm client to provide
runtime control over data node debug logging. NODE
DEBUG ON
causes a data node to write extra debugging
information to its node log, the same as if the node had been
started with --verbose
.
NODELOG DEBUG OFF
disables the extra logging.
(WL #11216)
Added the LocationDomainId
configuration
parameter for management, data, and API nodes. When using NDB
Cluster in a cloud environment, you can set this parameter to
assign a node to a given availability domain or availability
zone. This can improve performance in the following ways:
If requested data is not found on the same node, reads can be directed to another node in the same availability domain.
Communication between nodes in different availability
domains are guaranteed to use NDB
transporters' WAN support without any further manual
intervention.
The transporter's group number can be based on which availability domain is used, such that also SQL and other API nodes communicate with local data nodes in the same availability domain whenever possible.
The arbitrator can be selected from an availability domain in which no data nodes are present, or, if no such availability domain can be found, from a third availability domain.
This parameter takes an integer value between 0 and 16, with 0
being the default; using 0 is the same as leaving
LocationDomainId
unset.
(WL #10172)
Important Change:
The --passwd
option for
ndb_top is now deprecated. It is removed (and
replaced with --password
) in NDB
7.6.5.
(Bug #88236, Bug #20733646)
References: See also: Bug #86615, Bug #26236320, Bug #26907833.
Replication:
With GTIDs generated for incident log events, MySQL error code
1590 (ER_SLAVE_INCIDENT) could not be skipped using the
--slave-skip-errors=1590
startup option on a
replication slave.
(Bug #26266758)
NDB Disk Data:
An ALTER TABLE
that switched the
table storage format between MEMORY
and
DISK
was always performed in place for all
columns. This is not correct in the case of a column whose
storage format is inherited from the table; the column's storage
type is not changed.
For example, this statement creates a table
t1
whose column c2
uses
in-memory storage since the table does so implicitly:
CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT) ENGINE NDB;
The ALTER TABLE
statement shown here is
expected to cause c2
to be stored on disk,
but failed to do so:
ALTER TABLE t1 STORAGE DISK TABLESPACE ts1;
Similarly, an on-disk column that inherited its storage format
from the table to which it belonged did not have the format
changed by ALTER TABLE ... STORAGE MEMORY
.
These two cases are now performed as a copying alter, and the storage format of the affected column is now changed. (Bug #26764270)
NDB Replication:
On an SQL node not being used for a replication channel with
sql_log_bin=0
it was possible
after creating and populating an NDB table for a table map event
to be written to the binary log for the created table with no
corresponding row events. This led to problems when this log was
later used by a slave cluster replicating from the mysqld where
this table was created.
Fixed this by adding support for maintaining a cumulative
any_value
bitmap for global checkpoint event
operations that represents bits set consistently for all rows of
a specific table in a given epoch, and by adding a check to
determine whether all operations (rows) for a specific table are
all marked as NOLOGGING
, to prevent the
addition of this table to the Table_map
held
by the binlog injector.
As part of this fix, the NDB API adds a new
getNextEventOpInEpoch3()
method which provides information about any
AnyValue
received by making it possible to
retrieve the cumulative any_value
bitmap.
(Bug #26333981)
ndbinfo Information Database:
Counts of committed rows and committed operations per fragment
used by some tables in ndbinfo
were taken from the DBACC
block, but due to
the fact that commit signals can arrive out of order, transient
counter values could be negative. This could happen if, for
example, a transaction contained several interleaved insert and
delete operations on the same row; in such cases, commit signals
for delete operations could arrive before those for the
corresponding insert operations, leading to a failure in
DBACC
.
This issue is fixed by using the counts of committed rows which
are kept in DBTUP
, which do not have this
problem.
(Bug #88087, Bug #26968613)
Errors in parsing NDB_TABLE
modifiers could
cause memory leaks.
(Bug #26724559)
Added DUMP
code 7027 to facilitate testing of
issues relating to local checkpoints. For more information, see
DUMP 7027.
(Bug #26661468)
A previous fix intended to improve logging of node failure handling in the transaction coordinator included logging of transactions that could occur in normal operation, which made the resulting logs needlessly verbose. Such normal transactions are no longer written to the log in such cases. (Bug #26568782)
References: This issue is a regression of: Bug #26364729.
Due to a configuration file error, CPU locking capability was not available on builds for Linux platforms. (Bug #26378589)
Some DUMP
codes used for the
LGMAN
kernel block were incorrectly assigned
numbers in the range used for codes belonging to
DBTUX
. These have now been assigned symbolic
constants and numbers in the proper range (10001, 10002, and
10003).
(Bug #26365433)
Node failure handling in the DBTC
kernel
block consists of a number of tasks which execute concurrently,
and all of which must complete before TC node failure handling
is complete. This fix extends logging coverage to record when
each task completes, and which tasks remain, includes the
following improvements:
Handling interactions between GCP and node failure handling interactions, in which TC takeover causes GCP participant stall at the master TC to allow it to extend the current GCI with any transactions that were taken over; the stall can begin and end in different GCP protocol states. Logging coverage is extended to cover all scenarios. Debug logging is now more consistent and understandable to users.
Logging done by the QMGR
block as it
monitors duration of node failure handling duration is done
more frequently. A warning log is now generated every 30
seconds (instead of 1 minute), and this now includes
DBDIH
block debug information (formerly
this was written separately, and less often).
To reduce space used, DBTC instance
is shortened to
number
:DBTC
.
number
:
A new error code is added to assist testing.
(Bug #26364729)
During a restart, DBLQH
loads redo log part
metadata for each redo log part it manages, from one or more
redo log files. Since each file has a limited capacity for
metadata, the number of files which must be consulted depends on
the size of the redo log part. These files are opened, read, and
closed sequentially, but the closing of one file occurs
concurrently with the opening of the next.
In cases where closing of the file was slow, it was possible for
more than 4 files per redo log part to be open concurrently;
since these files were opened using the
OM_WRITE_BUFFER
option, more than 4 chunks of
write buffer were allocated per part in such cases. The write
buffer pool is not unlimited; if all redo log parts were in a
similar state, the pool was exhausted, causing the data node to
shut down.
This issue is resolved by avoiding the use of
OM_WRITE_BUFFER
during metadata reload, so that
any transient opening of more than 4 redo log files per log file
part no longer leads to failure of the data node.
(Bug #25965370)
Following TRUNCATE TABLE
on an
NDB
table, its
AUTO_INCREMENT
ID was not reset on an SQL
node not performing binary logging.
(Bug #14845851)
A join entirely within the materialized part of a semijoin was
not pushed even if it could have been. In addition,
EXPLAIN
provided no information
about why the join was not pushed.
(Bug #88224, Bug #27022925)
References: See also: Bug #27067538.
When the duplicate weedout algorithm was used for evaluating a semijoin, the result had missing rows. (Bug #88117, Bug #26984919)
References: See also: Bug #87992, Bug #26926666.
A table used in a loose scan could be used as a child in a pushed join query, leading to possibly incorrect results. (Bug #87992, Bug #26926666)
When representing a materialized semijoin in the query plan, the
MySQL Optimizer inserted extra QEP_TAB
and
JOIN_TAB
objects to represent access to the
materialized subquery result. The join pushdown analyzer did not
properly set up its internal data structures for these, leaving
them uninitialized instead. This meant that later usage of any
item objects referencing the materialized semijoin accessed an
initialized tableno
column when accessing a
64-bit tableno
bitmask, possibly referring to
a point beyond its end, leading to an unplanned shutdown of the
SQL node.
(Bug #87971, Bug #26919289)
In some cases, a SCAN_FRAGCONF
signal was
received after a SCAN_FRAGREQ
with a close
flag had already been sent, clearing the timer. When this
occurred, the next SCAN_FRAGREF
to arrive
caused time tracking to fail. Now in such cases, a check for a
cleared timer is performed prior to processing the
SCAN_FRAGREF
message.
(Bug #87942, Bug #26908347)
While deleting an element in Dbacc
, or moving
it during hash table expansion or reduction, the method used
(getLastAndRemove()
) could return a reference
to a removed element on a released page, which could later be
referenced from the functions calling it. This was due to a
change brought about by the implementation of dynamic index
memory in NDB 7.6.2; previously, the page had always belonged to
a single Dbacc
instance, so accessing it was
safe. This was no longer the case following the change; a page
released in Dbacc
could be placed directly
into the global page pool where any other thread could then
allocate it.
Now we make sure that newly released pages in
Dbacc
are kept within the current
Dbacc
instance and not given over directly to
the global page pool. In addition, the reference to a released
page has been removed; the affected internal method now returns
the last element by value, rather than by reference.
(Bug #87932, Bug #26906640)
References: See also: Bug #87987, Bug #26925595.
The DBTC
kernel block could receive a
TCRELEASEREQ
signal in a state for which it
was unprepared. Now it such cases it responds with a
TCRELEASECONF
message, and subsequently
behaves just as if the API connection had failed.
(Bug #87838, Bug #26847666)
References: See also: Bug #20981491.
When a data node was configured for locking threads to CPUs, it failed during startup with Failed to lock tid.
This was is a side effect of a fix for a previous issue, which
disabled CPU locking based on the version of the available
glibc
. The specific glibc
issue being guarded against is encountered only in response to
an internal NDB API call (Ndb_UnlockCPU()
)
not used by data nodes (and which can be accessed only through
internal API calls). The current fix enables CPU locking for
data nodes and disables it only for the relevant API calls when
an affected glibc
version is used.
(Bug #87683, Bug #26758939)
References: This issue is a regression of: Bug #86892, Bug #26378589.
ndb_top failed to build on platforms where
the ncurses
library did not define
stdscr
. Now these platforms require the
tinfo
library to be included.
(Bug #87185, Bug #26524441)
On completion of a local checkpoint, every node sends a
LCP_COMPLETE_REP
signal to every other node
in the cluster; a node does not consider the LCP complete until
it has been notified that all other nodes have sent this signal.
Due to a minor flaw in the LCP protocol, if this message was
delayed from another node other than the master, it was possible
to start the next LCP before one or more nodes had completed the
one ongoing; this caused problems with
LCP_COMPLETE_REP
signals from previous LCPs
becoming mixed up with such signals from the current LCP, which
in turn led to node failures.
To fix this problem, we now ensure that the previous LCP is
complete before responding to any
TCGETOPSIZEREQ
signal initiating a new LCP.
(Bug #87184, Bug #26524096)
NDB Cluster did not compile successfully when the build used
WITH_UNIT_TESTS=OFF
.
(Bug #86881, Bug #26375985)
Recent improvements in local checkpoint handling that use
OM_CREATE
to open files did not work
correctly on Windows platforms, where the system tried to create
a new file and failed if it already existed.
(Bug #86776, Bug #26321303)
A potential hundredfold signal fan-out when sending a
START_FRAG_REQ
signal could lead to a node
failure due to a job buffer full error in
start phase 5 while trying to perform a local checkpoint during
a restart.
(Bug #86675, Bug #26263397)
References: See also: Bug #26288247, Bug #26279522.
Compilation of NDB Cluster failed when using
-DWITHOUT_SERVER=1
to build only the client
libraries.
(Bug #85524, Bug #25741111)
The NDBFS
block's OM_SYNC
flag is intended to make sure that all FSWRITEREQ signals used
for a given file are synchronized, but was ignored by platforms
that do not support O_SYNC
, meaning that this
feature did not behave properly on those platforms. Now the
synchronization flag is used on those platforms that do not
support O_SYNC
.
(Bug #76975, Bug #21049554)