MySQL NDB Cluster 8.0 Release Notes
NDB Cluster APIs:
The version of Node.js
used by
NDB
has been upgraded to 12.20.1.
(Bug #32356419)
ndbinfo Information Database:
Added the dict_obj_tree
table
to the ndbinfo
information
database. This table provides information about
NDB
database objects similar to what is shown
by the dict_obj_info
table,
but presents it in a hierarchical or tree-like fashion that
simplifies seeing relationships between objects such as: tables
and indexes; tablespaces and data files; log file groups and
undo log files.
An example of such a view of a table t1
,
having a primary key on column a
and a unique
key on column b
, is shown here:
mysql>SELECT indented_name FROM ndbinfo.dict_obj_tree
->WHERE root_name = 'test/def/t1';
+----------------------------+ | indented_name | +----------------------------+ | test/def/t1 | | -> sys/def/13/b | | -> NDB$INDEX_15_CUSTOM | | -> sys/def/13/b$unique | | -> NDB$INDEX_16_UI | | -> sys/def/13/PRIMARY | | -> NDB$INDEX_14_CUSTOM | +----------------------------+ 7 rows in set (0.15 sec)
For additional information and examples, see The ndbinfo dict_obj_tree Table. (Bug #32198754)
ndbinfo Information Database:
Added the backup_id
table to
the ndbinfo
information
database. This table contains a single column
(id
) and a single row, in which the column
value is the backup ID of the most recent backup of the cluster
taken with the ndb_mgm client. If no NDB
backups can be found, the value is 0.
Selecting from this table replaces the process of obtaining this
information by using the ndb_select_all
utility to dump the contents of the internal
SYSTAB_0
table, which is error-prone and can
require an excessively long time to complete.
(Bug #32073640)
Added the status variable
Ndb_config_generation
, which
shows the generation number of the current configuration being
used by the cluster. This can be used as an indicator to
determine whether the configuration of the cluster has changed.
(Bug #32247424)
NDB Cluster now uses the MySQL
host_application_signal
component service to
perform shutdown of SQL nodes.
(Bug #30535835, Bug #32004109)
NDB
has implemented the following two
improvements in calculation of index statistics:
Previously, index statistics were collected from a single fragment only; this is changed such that additional fragments are used for these.
The algorithm used for very small tables, such as those having very few rows where results are discarded, has been improved, so that estimates for such tables should be more accurate than previously.
See NDB API Statistics Counters and Variables, for more information. (WL #13144)
A number of NDB Cluster programs now support input of the
password for encrypting or decrypting an NDB
backup from standard input. Changes relating to each program
affected are listed here:
For ndb_restore, the
--backup-password-from-stdin
option introduced in this release enables input of the
password in a secure fashion, similar to how it is done by
the mysql client'
--password
option. Use this
option together with the
--decrypt
option.
ndb_print_backup_file now also supports
--backup-password-from-stdin
as the long form of the existing -P
option.
For ndb_mgm,
--backup-password-from-stdin
is supported together with --execute
"START BACKUP [
for starting an encrypted cluster backup from the system
shell, and has the same effect.
options
]"
Two options for ndbxfrm,
--encrypt-password-from-stdin
and
--decrypt-password-from-stdin
,
which are also introduced in this release, cause similar
behavior when using this program, respectively, to encrypt
or to decrypt a backup file.
In addition, you can cause ndb_mgm to use
encryption whenever it creates a backup by starting it with
--encrypt-backup
. In this case,
the user is prompted for a password when invoking
START BACKUP
if none is
supplied. This option can also be specified in the
[ndb_mgm]
section of the
my.cnf
file.
Also, the behavior and syntax of the ndb_mgm
management client START
BACKUP
are changed slightly, such that it is now
possible to use the ENCRYPT
option without
also specifying PASSWORD
. Now when the user
does this, the management client prompts the user for a
password.
For more information, see the descriptions of the NDB Cluster programs and program options just mentioned, as well as Online Backup of NDB Cluster. (WL #14259)
Packaging:
The mysql-cluster-community-server-debug
and
mysql-cluster-commercial-server-debug
RPM
packages were dependent on
mysql-community-server
and
mysql-commercial-server
, respectively,
instead of mysql-cluster-community-server
and
mysql-cluster-commercial-server
.
(Bug #32683923)
Packaging:
RPM upgrades from NDB 7.6.15 to 8.0.22 did not succeed due to a
file having been moved from the server
RPM to
the client-plugins
RPM.
(Bug #32208337)
Linux:
On Linux systems, NDB
interpreted memory
sizes obtained from /proc/meminfo
as being
supplied in bytes rather than kilobytes.
(Bug #102505, Bug #32474829)
Microsoft Windows: Removed several warnings which were generated when building NDB Cluster on Windows using Microsoft Visual Studio 2019. (Bug #32107056)
Microsoft Windows:
NDB
failed to start correctly on Windows when
initializing the NDB
library with
ndb_init()
, with the error Failed
to find CPU in CPU group.
This issue was due to how Windows works with regard to assigning
processes to CPUs: when there are more than 64 logical CPUs on a
machine, Windows divides them into different processor groups
during boot. Each processor group can at most hold 64 CPUs; by
default, a process can be assigned to only one processor group.
The function
std::thread::hardware_concurrency()
was used
to get the maximum number of logical CPUs on the machine, but on
Windows, this function returns only the maximum number of
logical CPUs present in the processor group with which the
current process is affiliated. This value is used to allocate
memory for an array that holds hardware information about each
CPU on the machine. Since the array held valid memory for CPUs
from only one processor group, any attempt to store and retrieve
hardware information about a CPU in a different processor group
led to array bound read/write errors, leading to memory
corruption and ultimately leads to process failures.
Fixed by using GetActiveProcessorCount()
instead of the hardware_concurrency()
function referenced previously.
(Bug #101347, Bug #32074703)
Solaris:
While preparing NDBFS
for
handling of encrypted backups, activation of
O_DIRECT
was suspended until after
initialization of files was completed. This caused
initialization of redo log files to require an excessive amount
of time on systems using hard disk drives with
ext3
file systems.
On Solaris, directio
is used instead of
O_DIRECT
; activating
directio
prior to initialization of files
caused a notable increase in time required when using hard disk
drives with UFS
file systems.
Now we ensure that, on systems having
O_DIRECT
, this is activated before
initialization of files, and that, on Solaris,
directio
continues to be activated after
initialization of files.
(Bug #32187942)
NDB Cluster APIs: Several NDB API coding examples included in the source did not release all resources allocated. (Bug #31987735)
NDB Cluster APIs:
Some internal dictionary objects in NDB
used
an internal name format which depends on the database name of
the Ndb
object. This dependency has been made
more explicit where necessary and otherwise removed.
Users of the NDB API should be aware that the
fullyQualified
argument to
Dictionary::listObjects()
still
works in such a way that specifying it as
false
causes the objects in the list it
returns to use fully qualified names.
(Bug #31924949)
ndbinfo Information Database:
The system variables
ndbinfo_database
and
ndbinfo_table_prefix
are
intended to be read-only. It was found that it was possible to
set mysqld command-line options corresponding
to either or both of these; doing so caused the
ndbinfo
database to
malfunction. This fix insures that it is no longer possible to
set either of these variables in the mysql
client or from the command line.
(Bug #23583256)
In some cases, a query affecting a user with the
NDB_STORED_USER
privilege could
be printed to the MySQL server log without being rewritten. Now
such queries are omitted or rewritten to remove any text
following the keyword IDENTIFIED
.
(Bug #32541096)
The value set for the
SpinMethod
data node
configuration parameter was ignored.
(Bug #32478388)
The compile-time debug flag DEBUG_FRAGMENT_LOCK
was enabled by default. This caused increased resource usage by
DBLQH
, even for release
builds.
This is fixed by disabling DEBUG_FRAGMENT_LOCK
by default.
(Bug #32459625)
ndb_mgmd now exits gracefully in the event of
a SIGTERM
just as it does following a
management client SHUTDOWN
command.
(Bug #32446105)
When started on a port which was already in use,
ndb_mgmd did not throw any errors since the
use of SO_REUSEADDR
on Windows platforms
allowed multiple sockets to bind to the same address and port.
To take care of this issue, we replace
SO_REUSEADDRPORT
with
SO_EXCLUSIVEADDRUSE
, which prevents re-use of
a port that is already in use.
(Bug #32433002)
Encountering an error in detection of an initial system restart of the cluster caused the SQL node to exit prematurely. (Bug #32424580)
Under some situations, when trying to measure the time of a CPU pause, an elapsed time of zero could result. In addition, computing the average for a very fast spin (for example, 100 loops taking less than 100ns) could zero nanoseconds. In both cases, this caused the spin calibration algorithm throw an arithmetic exception due to division by zero.
We fix both issues by modifying the algorithm so that it ignores zero values when computing mean spin time. (Bug #32413458)
References: See also: Bug #32497174.
Table and database names were not formatted correctly in the
messages written to the mysqld error log when
the internal method
Ndb_rep_tab_reader::scan_candidates()
found
ambiguous matches for a given database, table, or server ID in
the ndb_replication
table.
(Bug #32393245)
Some queries with nested pushed joins were not processed correctly. (Bug #32354817)
When ndb_mgmd allocates a node ID, it reads through the configuration to find a suitable ID, causing a mutex to be held while performing hostname lookups. Because network address resolution can require large amounts of time, it is not considered good practice to hold such a mutex or lock while performing network operations.
This issue is fixed by building a list of configured nodes while holding the mutex, then using the list to perform hostname matching and other logic. (Bug #32294679)
The schema distribution participant failed to start a global
checkpoint after writing a reply to the
ndb_schema_result
table, which caused an
unnecessary delay before the coordinator received events from
the participant notifying it of the result.
(Bug #32284873)
The global DNS cache used in ndb_mgmd caused stale lookups when restarting a node on a new machine with a new IP address, which meant that the node could not allocate a node ID.
This issue is addressed by the following changes:
Node ID allocation no longer depends on
LocalDnsCache
DnsCache
now uses local scope only
(Bug #32264914)
ndb_restore generated a core file when started with unknown or invalid arguments. (Bug #32257374)
Auto-synchronization detected the presence of mock foreign key tables in the NDB dictionary and attempted to re-create them in the MySQL server's data dictionary, although these should remain internal to the NDB Dictionary and not be exposed to the MySQL server. To fix this issue, we now ensure that the NDB Cluster auto-synchronization mechanism ignores any such mock tables. (Bug #32245636)
Improved resource usage associated with handling of cluster configuration data. (Bug #32224672)
Removed left-over debugging printouts from ndb_mgmd showing a client's version number upon connection. (Bug #32210216)
References: This issue is a regression of: Bug #30599413.
The backup abort protocol for handling of node failures did not function correctly for single-threaded data nodes (ndbd). (Bug #32207193)
While retrieving sorted results from a pushed-down join using
ORDER BY
with the index
access method (and without filesort
), an SQL
node sometimes unexpectedly terminated.
(Bug #32203548)
Logging of redo log initialization showed log part indexes rather than log part numbers. (Bug #32200635)
Signal data was overwritten (and lost) due to use of extended signal memory as temporary storage. Now in such cases, extended signal memory is not used in this fashion. (Bug #32195561)
When
ClassicFragmentation
= 1
, the default number of partitions per node (shown
in ndb_desc output as
PartitionCount
) is calculated using the
lowest number of LDM threads employed by any single live node,
and was done only once, even after data nodes left or joined the
cluster, possibly with a new configuration changing the LDM
thread count and thus the default partition count. Now in such
cases, we make sure the default number of partitions per node is
recalculated each time data nodes join or leave the cluster.
This is not an issue in NDB 8.0.23 and later, when
ClassicFragmentation
is set to 0.
(Bug #32183985)
The internal function Ndb_ReloadHWInfo()
is
responsible for updating hardware information for all the CPUs
on the host. For the Linux ARM platform, which does not have
Level 3 cache information, this assigned a socket ID for the L3
cache ID but failed to record the value for the global variable
num_shared_l3_caches
, which is needed when
creating lists of CPUs connected to a shared L3 cache.
(Bug #32180383)
When trying to run two management nodes on the same host and
using the same port number, it was not always obvious to users
why they did not start. Now in such cases, in addition to
writing a message to the error log, an error message
Same port number is specified for management nodes
node_id1
and
node_id2
(or) they both are using the
default port number on same host
host_name
is also written
to the console, making the source of the issue more immediately
apparent.
(Bug #32175157)
Added a --cluster-config-suffix
option for
ndb_mgmd and ndb_config,
for use in internal testing to override a defaults group suffix.
(Bug #32157276)
The management server returned the wrong status for host name matching when some of the host names in configuration did not resolve and client trying to allocate a node ID connected from the host whose host name resolved to a loopback address with the error Could not alloc node id at <host>:<port>: Connection with id X done from wrong host ip 127.0.0.1, expected <unresolvable_host> (lookup failed).
This caused the connecting client to fail the node ID allocation.
This issue is fixed by rewriting the internal match_hostname() function so that it contains all logic for how the requesting client address should match the configured hostnames, and so that it first checks whether the configured host name can be resolved; if not, it now returns a special error so that the client receives an error indicating that node ID allocation can be retried. The new error is Could not alloc node id at <host>:<port>: No configured host found of node type <type> for connection from ip 127.0.0.1. Some hostnames are currently unresolvable. Can be retried. (Bug #32136993)
The internal function
ndb_socket_create_dual_stack()
did not close
a newly created socket when a call to
ndb_setsockopt()
was unsuccessful.
(Bug #32105957)
The local checkpoint (LCP) mechanism was changed in NDB 7.6 such that it also detected idle fragments—that is, fragments which had not changed since the last LCP and thus required no on-disk metadata update. The LCP mechanism could then immediately proceed to handle the next fragment. When there were a great many such idle fragments, the CPU consumption required merely to loop through these became highly significant, causing latency spikes in user transactions.
A 1 ms delay was already inserted between each such idle fragment being handled. Testing later showed this to be too short an interval, and that we are normally not in as great a hurry to complete these idle fragments as we previously believed.
This fix extends the idle fragment delay time to 20 ms if there are no redo alerts indicating an urgent need to complete the LCP. In case of a low redo alert state we wait 5 ms instead, and for a higher alert state we fall back to the 1 ms delay. (Bug #32068551)
References: See also: Bug #31655158, Bug #31613158.
When an NDB
table was created, it was
invalidated in the global dictionary cache, but this was
unnecessary. Furthermore, having a table which exists in the
global dictionary cache is actually an advantage for subsequent
uses of the new table, since it can be found in the table cache
without performing a round trip to NDB
.
(Bug #32047456)
No clear error message was provided when an
ndb_mgmd process tried to start using the
PortNumber
of a port
that was already in use.
(Bug #32045786)
Two problems occurred when NDB
closed a
table:
NDB
failed to detect when the close was
done from FLUSH TABLES
, which
meant that the NDB table definitions in the global
dictionary cache were not invalidated.
When the close was done by a thread which had not used
NDB
earlier—for example when
FLUSH TABLES
or
RESET MASTER
closed instances
of ha_ndbcluster
held in the table
definition cache—a new Thd_ndb
object was allocated, even though there is a fallback to the
global Ndb
object in case the allocation
fails, which never occurs in such cases, so it is less
wasteful simply to use the global object already provided.
(Bug #32018394, Bug #32357856)
Removed a large number of compiler warnings relating to unused
function arguments in NdbDictionaryImpl
.
(Bug #31960757)
Unnecessary casts were performed when checking internal error codes. (Bug #31930166)
NDB
continued to use file system paths for
determining the names of tables to open or perform DDL on, in
spite of the fact that it longer actually uses files for these
operations. This required unnecessary translation between
character sets, handling the MySQL-specific file system
encoding, and parsing. In addition, results of these operations
were stored in buffers of fixed size, each instance of which
used several hundred bytes of memory unnecessarily. Since the
database and table names to use are already available to
NDB
through other means, this translation
could be (and has been) removed in most cases.
(Bug #31846478)
Generation of internal statistics relating to
NDB
object counts was found to lead to an
increase in transaction latency at very high rates of
transactions per second, brought about by returning an excessive
number of freed NDB
objects.
(Bug #31790329)
NDB
behaved unpredictably in response an
attempt to change permissions on a distributed user (that is, a
user having the NDB_STORED_USER
privilege) during a binary log thread shutdown and restart. We
address this issue by ensuring that the user gets a clear
warning Could not distribute ACL change to other
MySQL servers whenever distribution does not
succeed. This fix also improves a number of
mysqld log messages.
(Bug #31680765)
ndb_restore encountered intermittent errors while replaying backup logs which deleted blob values; this was due to deletion of blob parts when a main table row containing blob one or more values was deleted. This is fixed by modifying ndb_restore to use the asynchronous API for blob deletes, which does not trigger blob part deletes when a blob main table row is deleted (unlike the synchronous API), so that a delete log event for the main table deletes only the row from the main table. (Bug #31546136)
When a table creation schema transaction is prepared, the table
is in TS_CREATING
state, and is changed to
TS_ACTIVE
state when the schema transaction
commits on the DBDIH
block.
In the case where the node acting as DBDIH
coordinator fails while the schema transaction is committing,
another node starts taking over for the coordinator. The
following actions are taken when handling this node failure:
DBDICT
rolls the table
creation schema transaction forward and commits, resulting
in the table involved changing to
TS_ACTIVE
state.
DBDIH
starts removing the failed node
from tables by moving active table replicas on the failed
node from a list of stored fragment replicas to another
list.
These actions are performed asynchronously many times, and when
interleaving may cause a race condition. As a result, the
replica list in which the replica of a failed node resides
becomes nondeterministic and may differ between the recovering
node (that is, the new coordinator) and other
DIH
participant nodes. This difference
violated a requirement for knowing which list the failed node's
replicas can be found during the recovery of the failed node
recovery on the other participants.
To fix this, moving active table replicas now covers not only
tables in TS_ACTIVE
state, but those in
TS_CREATING
(prepared) state as well, since
the prepared schema transaction is always rolled forward.
In addition, the state of a table creation schema transaction
which is being aborted is now changed from
TS_CREATING
or TS_IDLE
to
TS_DROPPING
, to avoid any race condition
there.
(Bug #30521812)
START BACKUP
SNAPSHOTSTART WAIT STARTED
could return
control to the user prior to the backup's restore point from the
user point of view; that is the Backup
started
notification was sent before waiting for the
synchronising global checkpoint (GCP) boundary. This meant that
transactions committed after receiving the notification might be
included in the restored data.
To fix this problem, START BACKUP
now sends a
notification to the client that the backup has been started only
after the GCP has truly started.
(Bug #29344262)
Upgrading to NDB Cluster 8.0 from a prior release includes an
upgrade in the schema distribution mechanism, as part of which
the ndb_schema
table is dropped and recreated
in a way which causes all MySQL Servers connected to the cluster
to restart their binary log injector threads, causing a gap
event to be written to the binary log. Since the thread restart
happens at the same time on all MySQL Servers, no binary log
spans the time during which the schema distribution
functionality upgrade was performed, which breaks NDB Cluster
Replication.
This issue is fixed by adding support for gracefully
reconstituting the schema distribution tables while allowing the
injector thread to continue processing changes from the cluster.
This is implemented by handling the DDL event notification for
DROP TABLE
to turn off support
for schema distribution temporarily, and to start regular checks
to re-create the tables. When the tables have been successfully
created again, the regular checks are turned off and support for
schema distribution is turned back on.
NDB
also now detects automatically when the
ndb_apply_status
table has been dropped and
re-creates it. The drop and re-creation leaves a gap event in
the binary log, which in a replication setup causes the replica
MySQL Server to stop applying changes from the source until the
replication channel is restarted (see
ndb_apply_status Table).
In addition, the minimum version required to perform the schema distribution upgrade is raised to 8.0.24, which prevents automatic triggering of the schema distribution upgrade until all connected API nodes support the new upgrade procedure.
For more information, see NDB Cluster Replication Schema and Tables. (Bug #27697409, Bug #30877233)
References: See also: Bug #30876990.
Fixed a number of issues uncovered when trying to build
NDB
with GCC 6.
(Bug #25038373)
Calculation of the redo alert state based on redo log usage was overly aggressive, and thus incorrect, when using more than 1 log part per LDM.