MySQL NDB Cluster 8.0 Release Notes
NDB Cluster APIs:
The Node.js
library used to build the
MySQL NoSQL Connector for
JavaScript has been upgraded to version 18.12.1.
(Bug #35095122)
MySQL NDB ClusterJ:
Performance has been improved for accessing tables using a
single-column partition key when the column is of type
CHAR
or
VARCHAR
.
(Bug #35027961)
Beginning with this release, ndb_restore
implements the
--timestamp-printouts
option, which causes all error, info, and debug node log
messages to be prefixed with timestamps.
(Bug #34110068)
Microsoft Windows:
Two memory leaks found by code inspection were removed from
NDB
process handles on Windows platforms.
(Bug #34872901)
Microsoft Windows: On Windows platforms, the data node angel process did not detect whether a child data node process exited normally. We fix this by keeping an open process handle to the child and using this when probing for the child's exit. (Bug #34853213)
NDB Cluster APIs; MySQL NDB ClusterJ:
MySQL ClusterJ uses a scratch buffer for primary key hash
calculations which was limited to 10000 bytes, which proved too
small in some cases. Now we malloc()
the
buffer if its size is not sufficient.
This also fixes an issue with the
Ndb
object methods
startTransaction()
and
computeHash()
in the NDB
API: Previously, if either of these methods was passed a
temporary buffer of insufficient size, the method failed. Now in
such cases a temporary buffer is allocated.
Our thanks to Mikael Ronström for this contribution. (Bug #103814, Bug #32959894)
NDB Cluster APIs:
When dropping an event operation
(NdbEventOperation
) in the NDB
API, it was sometimes possible for the dropped event operation
to remain visible to the application after instructing the data
nodes to stop sending events related to this event operation,
but before all pending buffered events were consumed and
discarded. This could be observed in certain cases when
performing an online alter operation, such as ADD
COLUMN
or RENAME COLUMN
, along with
concurrent writes to the affected table.
Further analysis showed that the dropped events were accessible
when iterating through event operations with
Ndb::getGCIEventOperations()
.
Now, this method skips dropped events when called iteratively.
(Bug #34809944)
NDB Cluster APIs:
Event::getReport()
always
returned an error for an event opened from
NDB
, instead of returning the flags actually
used by the report object.
(Bug #34667384)
Before a new NDB
table definition can be
stored in the data dictionary, any existing definition must be
removed. Table definitions have two unique values, the table
name and the NDB Cluster se_private_id
.
During installation of a new table definition, we check whether
there is any existing definition with the same table name and,
if so, remove it. Then we check whether the table removed and
the one being installed have the same
se_private_id
; if they do not, any definition
that is occupying this se_private_id
is
considered stale, and removed as well.
Problems arose when no existing definition was found by the
search using the table's name, since no definition was dropped
even if one occupied se_private_id
, leading
to a duplicate key error when attempting to store the new table.
The internal store_table()
function attempted
to clear the diagnostics area, remove the stale definition of
se_private_id
, and try to store it once
again, but the diagnostics area was not actually cleared, thus
leaking the error is thus leaked and presenting it to the user.
To fix this, we remove any stale table definition, regardless of
any action taken (or not) by store_table()
.
(Bug #35089015)
Fixed the following two issues in the output of ndb_restore:
The backup file format version was shown for both the backup file format version and the version of the cluster which produced the backup.
To reduce confusion between the version of the file format and the version of the cluster which produced the backup, the backup file format version is now shown using hexadecimal notation.
(Bug #35079426)
References: This issue is a regression of: Bug #34110068.
Removed a memory leak in the
DBDICT
kernel block caused
when an internal foreign key definition record was not released
when no longer needed. This could be triggered by either of the
following events:
Drop of a foreign key constraint on an
NDB
table
Rejection of an attempt to create a foreign key constraint
on an NDB
table
Such records use the DISK_RECORDS
memory
resource; you can check this on a running cluster by executing
SELECT node_id, used FROM ndbinfo.resources WHERE
resource_name='DISK_RECORDS'
in the
mysql client. This resource uses
SharedGlobalMemory
,
exhaustion of which could lead not only to the rejection of
attempts to create foreign keys, but of queries making use of
joins as well, since the
DBSPJ
block also uses shared
global memory by way of QUERY_MEMORY
.
(Bug #35064142)
When attempting a copying alter operation with
--ndb-allow-copying-alter-table =
OFF
, the reason for rejection of the statement was not
always made clear to the user.
(Bug #35059079)
When a transaction coordinator is starting fragment scans with many fragments to scan, it may take a realtime break (RTB) during the process to ensure fair CPU access for other requests. When the requesting API disconnected and API failure handling for the scan state occurred before the RTB continuation returned, continuation processing could not proceed because the scan state had been removed.
We fix this by adding appropriate checks on the scan state as part of the continuation process. (Bug #35037683)
Sender and receiver signal IDs were printed in trace logs as
signed values even though they are actually unsigned 32-bit
numbers. This could result in confusion when the top bit was
set, as it cuased such numbers to be shown as negatives,
counting upwards from -MAX_32_BIT_SIGNED_INT
.
(Bug #35037396)
A fiber used by the DICT
block monitors all indexes, and triggers index statistics
calculations if requested by
DBTUX
index fragment
monitoring; these calculations are performed using a schema
transaction. When the DICT
fiber attempts but
fails to seize a transaction handle for requesting a schema
transaction to be started, fiber exited, so that no more
automated index statistics updates could be performed without a
node failure.
(Bug #34992370)
References: See also: Bug #34007422.
Schema objects in NDB use composite versioning, comprising major and minor subversions. When a schema object is first created, its major and minor versions are set; when an existing schema object is altered in place, its minor subversion is incremented.
At restart time each data node checks schema objects as part of recovery; for foreign key objects, the versions of referenced parent and child tables (and indexes, for foreign key references not to or from a table's primary key) are checked for consistency. The table version of this check compares only major subversions, allowing tables to evolve, but the index version also compares minor subversions; this resulted in a failure at restart time when an index had been altered.
We fix this by comparing only major subversions for indexes in such cases. (Bug #34976028)
References: See also: Bug #21363253.
ndb_import sometimes silently ignored hint
failure for tables having large
VARCHAR
primary keys. For hinting
which transaction coordinator to use,
ndb_import can use the row's partitioning
key, using a 4092 byte buffer to compute the hash for the key.
This was problematic when the key included a
VARCHAR
column using UTF8, since the hash
buffer may require in bytes up to 24 times the number of maximum
characters in the column, depending on the column's collation;
the hash computation failed but the calling code in
ndb_import did not check for this, and
continued using an undefined hash value which yielded an
undefined hint.
This did not lead to any functional problems, but was not optimal, and the user was not notified of it.
We fix this by ensuring that ndb_import always uses sufficient buffer for handling character columns (regardless of their collations) in the key, and adding a check in ndb_import for any failures in hash computation and reporting these to the user. (Bug #34917498)
When the ndbcluster
plugin creates the
ndb_schema
table, the plugin inserts a row
containing metadata, which is needed to keep track of this NDB
Cluster instance, and which is stored as a set of key-value
pairs in a row in this table.
The ndb_schema
table is hidden from MySQL and
so not possible to query using SQL, but contains a UUID
generated by the same MySQL server that creates the
ndb_schema
table; the same UUID is also
stored as metadata in the data dictionary of each MySQL Server
when the ndb_schema
table is installed on it.
When a mysqld connects (or reconnects) to
NDB
, it compares the UUID in its own data
dictionary with the UUID stored in NDB
in
order to detect whether it is reconnecting to the same cluster;
if not, the entire contents of the data dictionary are scrapped
in order to make it faster and easier to install all tables
fresh from NDB
.
One such case occurs when all NDB
data nodes
have been restarted with --initial
,
thus removing all data and tables. Another happens when the
ndb_schema
table has been restored from a
backup without restoring any of its data, since this means that
the row for the ndb_schema
table would be
missing.
To deal with these types of situations, we now make sure that,
when synchronization has completed, there is always a row in the
NDB
dictionary with a UUID matching the UUID
stored in the MySQL server data dictionary.
(Bug #34876468)
When running an NDB Cluster with multiple management servers, termination of the ndb_mgmd processes required an excessive amount of time when shutting down the cluster. (Bug #34872372)
Schema distribution timeout was detected by the schema
distribution coordinator after dropping and re-creating the
mysql.ndb_schema
table when any nodes that
were subscribed beforehand had not yet resubscribed when the
next schema operation began. This was due to a stale list of
subscribers being left behind in the schema distribution data;
these subscribers were assumed by the coordinator to be
participants in subsequent schema operations.
We fix this issue by clearing the list of known subscribers
whenever the mysql.ndb_schema
table is
dropped.
(Bug #34843412)
When requesting a new global checkpoint (GCP) from the data
nodes, such as by the NDB Cluster handler in
mysqld to speed up delivery of schema
distribution events and responses, the request was sent 100
times. While the DBDIH
block
attempted to merge these duplicate requests into one, it was
possible on occasion to trigger more than one immediate GCP.
(Bug #34836471)
When the DBSPJ
block
receives a query for execution, it sets up its own internal plan
for how to do so. This plan is based on the query plan provided
by the optimizer, with adaptions made to provide the most
efficient execution of the query, both in terms of elapsed time
and of total resources used.
Query plans received by DBSPJ
often contain
star joins, in which several child tables depend on a common
parent, as in the query shown here:
SELECT STRAIGHT_JOIN * FROM t AS t1 INNER JOIN t AS t2 ON t2.a = t1.k INNER JOIN t AS t3 ON t3.k = t1.k;
In such cases DBSPJ
could submit key-range
lookups to t2
and t3
in
parallel (but does not do so). An inner join also has the
property that each inner joined row requires a match from the
other tables in the same join nest, else the row is eliminated
from the result set. Thus, by using the key-range lookups, we
may retrieve rows from one such lookup which have no matches in
the other, which effort is ultimately wasted. Instead,
DBSPJ
sets up a sequential plan for such a
query.
It was found that this worked as intended for queries having only inner joins, but if any of the tables are left-joined, we did not take complete advantage of the preceding inner joined tables before issuing the outer joined tables. Suppose the previous query is modified to include a left join, like this:
SELECT STRAIGHT_JOIN * FROM t AS t1 INNER JOIN t AS t2 ON t2.a = t1.k LEFT JOIN t AS t3 ON t3.k = t1.k;
Using the following query against the
ndbinfo.counters
table, it is
possible to observe how many rows are returned for each query
before and after query execution:
SELECT counter_name, SUM(val) FROM ndbinfo.counters WHERE block_name="DBSPJ" AND counter_name = "SCAN_ROWS_RETURNED";
It was thus determined that requests on t2
and t3
were submitted in parallel. Now in
such cases, we wait for the inner join to complete before
issuing the left join, so that unmatched rows from
t1
can be eliminated from the outer join on
t1
and t3
. This results in
less work to be performed by the data nodes, and reduces the
volumne handled by the transporter as well.
(Bug #34782276)
SPJ handling of a sorted result was found to suffer a
significant performance impact compared to the same result set
when not sorted. Further investigation showed that most of the
additional performance overhead for sorted results lay in the
implementation for sorted result retrieval, which required an
excessive number of SCAN_NEXTREQ
round trips
between the client and DBSPJ
on the data nodes.
(Bug #34768353)
DBSPJ
now implements the
firstMatch
optimization for semijoins and
antijoins, such as those found in EXISTS
and
NOT EXISTS
subqueries.
(Bug #34768191)
When the DBSPJ
block sends
SCAN_FRAGREQ
and
SCAN_NEXTREQ
signals to the data nodes, it
tries to determine the optimum number of fragments to scan in
parallel without starting more parallel scans than needed to
fill the available batch buffers, thus avoiding any need to send
additional SCAN_NEXTREQ
signals to complete
the scan of each fragment.
The DBSPJ
block's statistics module
calculates and samples the parallelism which was optimal for
fragment scans just completed, for each completed
SCAN_FRAGREQ
, providing a mean and standard
deviation of the sampled parallelism. This makes it possible to
calculate a lower 95th percentile of the parallelism (and batch
size) which makes it possible to complete a
SCAN_FRAGREQ
without needing additional
SCAN_NEXTREQ
signals.
It was found that the parallelism statistics seemed unable to provide a stable parallelism estimate and that the standard deviation was unexpectedly high. This often led to the parallelism estimate being a negative number (always rounded up to 1).
The flaw in the statistics calculation was found to be an
underlying assumption that each sampled
SCAN_FRAGREQ
contained the same number of key
ranges to be scanned, which is not necessarily the case.
Typically a full batch of rows for the first
SCAN_FRAGREQ
, and relatively few rows for the
final SCAN_NEXTREQ
returning the remaining
rows; this resulted in wide variation in parallelism samples
which made the statistics obtained from them unreliable.
We fix this by basing the statistics on the number of keys
actually sent in the SCAN_FRAGREQ
, and
counting the rows returned from this request. Based on this it
is possible to obtain record-per-key statistics to be calculated
and sampled. This makes it possible to calculate the number of
fragments which can be scanned, without overflowing the batch
buffers.
(Bug #34768106)
It was possible in certain cases that both the
NDB
binary logging thread and metadata
synchronization attempted to synchronize the
ndb_apply_status
table, which led to a race
condition. We fix this by making sure that the
ndb_apply_status
table is monitored and
created (or re-created) by the binary logging thread only.
(Bug #34750992)
While starting a schema operation, the client is responsible for detecting timeouts until the coordinator has received the first schema event; from that point, any schema operation timeout should be detected by the coordinator. A problem occurred while the client was checking the timeout; it mistakenly set the state indicating that timeout had occurred, which caused the coordinator to ignore the first schema event taking longer than approximately one second to receive (that is, to write the send event plus handle in the binary logging thread). This had the effect that, in these cases, the coordinator was not involved in the schema operation.
We fix this by change the schema distribution timeout checking to be atomic, and to let it be performed by either the client or the coordinator. In addition, we remove the state variable used for keeping track of events received by the coordinator, and rely on the list of participants instead. (Bug #34741743)
An SQL node did not start up correctly after restoring data with
ndb_restore, such that, when it was otherwise
ready to accept connections, the binary log injector thread
never became ready. It was found that, when a
mysqld was started after a data node initial
restore from which new table IDs were generated, the utility
table's (ndb_*
) MySQL data dictionary
definition might not match the NDB dictionary definition.
The existing mysqld definition is dropped by
name, thus removing the unique
ndbcluster-
key
in the MySQL data dictionary but the new table ID could also
already be occupied by another (stale) definition. The resulting
mistmatch prevented setup of the binary log.
ID
To fix this problem we now explicitly drop any
ndbcluster-
definitions that might clash in such cases with the table being
installed.
(Bug #34733051)ID
After receiving a SIGTERM
signal,
ndb_mgmd did not wait for all threads to shut
down before exiting.
(Bug #33522783)
References: See also: Bug #32446105.
When multiple operations are pending on a single row, it is not
possible to commit an operation which is run concurrently with
an operation which is pending abort. This could lead to data
node shutdown during the commit operation in
DBACC
, which could manifest
when a single transaction contained more than
MaxDMLOperationsPerTransaction
DML operations.
In addition, a transaction containing insert operations is
rolled back if a statement that uses a locking scan on the
prepared insert fails due to too many DML operations. This could
lead to an unplanned data node shutdown during tuple
deallocation due to a missing reference to the expected
DBLQH
deallocation
operation.
We solve this issue by allowing commit of a scan operation in
such cases, in order to release locks previously acquired during
the transaction. We also add a new special case for this
scenario, so that the deallocation is performed in a single
phase, and DBACC
tells
DBLQH
to deallocate immediately; in
DBLQH
,
execTUP_DEALLOCREQ()
is now able to handle
this immediate deallocation request.
(Bug #32491105)
References: See also: Bug #28893633, Bug #32997832.
Cluster nodes sometimes reported Failed to convert connection to transporter warnings in logs, even when this was not really necessary. (Bug #14784707)
When started with no connection string on the command line,
ndb_waiter printed Connecting to
mgmsrv at (null)
. Now in such cases, it prints
Connecting to management server at
nodeid=0,localhost:1186
if no other default host is
specified.
The --help
option and other
ndb_waiter program output was also improved.
(Bug #12380163)
NdbSpin_Init()
calculated the wrong number of
loops in NdbSpin
, and contained logic errors.
(Bug #108448, Bug #32497174, Bug #32594825)
References: See also: Bug #31765660, Bug #32413458, Bug #102506, Bug #32478388.