33.21 Changes in MySQL NDB Cluster 8.0.17 (2019-07-22, Release Candidate)

Functionality Added or Changed

Schema operation timeout detection has been moved from the schema distribution client to the schema distribution coordinator, which now checks ongoing schema operations for timeout at regular intervals, marks participants that have timed out, emits suitable warnings when a schema operation timeout occurs, and prints a list of any ongoing schema operations at regular intervals.
As part of this work, a new option --ndb-schema-dist-timeout makes it possible to set the number of seconds for a given SQL node to wait until a schema operation is marked as having timed out. (Bug #29556148)
Added the status variable Ndb_trans_hint_count_session, which shows the number of transactions started in the current session that used hints. Compare this with Ndb_api_trans_start_count_session to get the proportion of all NDB transactions in the current session that have been able to use hinting. (Bug #29127040)
When the cluster is in single user mode, the output of the ndb_mgm SHOW command now indicates which API or SQL node has exclusive access while this mode is in effect. (Bug #16275500)

Bugs Fixed

Important Change: Attempting to drop, using the mysql client, an NDB table that existed in the MySQL data dictionary but not in NDB caused mysqld to fail with an error. This situation could occur when an NDB table was dropped using the ndb_drop_table tool or in an NDB API application using dropTable(). Now in such cases, mysqld drops the table from the MySQL data dictionary without raising an error. (Bug #29125206)
Important Change: The dependency of ndb_restore on the NDBT library, which is used for internal testing only, has been removed. This means that the program no longer prints NDBT_ProgramExit: ... when terminating. Applications that depend upon this behavior should be updated to reflect this change when upgrading to this release. (WL #13117)
Packaging: Added debug symbol packages to NDB distributions for .deb-based platforms which do not generate these automatically. (Bug #29040024)
NDB Disk Data: If, for some reason, a disk data table exists in the NDB data dictionary but not in that of the MySQL server, the data dictionary is synchronized by installing the object. This can occur either during the schema synchronization phase when a MySQL server connects to an NDB Cluster, or during table discovery through a DML query or DDL statement.
For disk data tables which used a tablespace for storage, the tablespace ID is stored as part of the data dictionary object, but this was not set during synchronization. (Bug #29597249)
NDB Disk Data: Concurrent Disk Data table and tablespace DDL statements executed on the same SQL node caused a metadata lock deadlock. A DDL statement requires that an exclusive lock be taken on the object being modified and every such lock in turn requires that the global schema lock be acquired in NDB.
To fix this issue, NDB now tracks when a global schema lock corresponding to an exclusive lock on a tablespace is taken. If a different global schema lock request fails while the first lock, NDB assumes that there is a deadlock. In this case, the deadlock is handled by having the new request release all locks it previously acquired, then retrying them at a later point. (Bug #29394407)
References: See also: Bug #29175268.
NDB Disk Data: Following execution of ALTER TABLESPACE, SQL statements on an existing table using the affected tablespace failed with error 3508 Dictionary object id (id) does not exist where the object ID shown refers to the tablespace. Schema distribution of ALTER TABLESPACE involves dropping the old object from the data dictionary on a participating SQL node and creating a new one with a different dictionary object id, but the table object in the SQL node's data dictionary still used the old tablespace ID which rendered it unusable on the participants.
To correct this problem, tables using the tablespace are now retrieved and stored prior to the creation of the new tablespace, and then Updated the new object ID of the tablespace after it has been created in the data dictionary. (Bug #29389168)
NDB Cluster APIs: The memcached sources included with the NDB distribution would not build with -Werror=format-security. Now warnings are no longer treated as errors when compiling these files. (Bug #29512411)
NDB Cluster APIs: It was not possible to scan a table whose SingleUserMode property had been set to SingleUserModeReadWrite or SingleUserModeReadOnly. (Bug #29493714)
NDB Cluster APIs: The MGM API ndb_logevent_get_next2() function did not behave correctly on Windows and 32-bit Linux platforms. (Bug #94917, Bug #29609070)
The version of Python expected by ndb_setup.py was not specified clearly on some platforms. (Bug #29818645)
Lack of SharedGlobalMemory was incorrectly reported as lack of undo buffer memory, even though the cluster used no disk data tables. (Bug #29806771)
References: This issue is a regression of: Bug #92125, Bug #28537319.
Long TCKEYREQ signals did not always use the expected format when invoked from TCINDXREQ processing. (Bug #29772731)
It was possible for an internal NDB_SCHEMA_OBJECT to be released too early or not at all; in addition, it was possible to create such an object that reused an existing key. (Bug #29759063)
ndb_restore sometimes used exit() rather than exitHandler() to terminate the program, which could lead to resources not being properly freed. (Bug #29744353)
Improved error message printed when the maximum offset for a FIXED column is exceeded. (Bug #29714670)
Communication between the schema distribution client and the schema distribution coordinator is done using NDB_SCHEMA_OBJECT as well as by writing rows to the ndb_schema table in NDB. This allowed for the possibility of a number of different race conditions between when the registration of the schema operation and when the coordinator was notified of it.
This fix addresses the following issues related to the situation just described:
- The coordinator failed to abort active schema operations when the binary logging thread was restarted.
- Schema operations already registered were not aborted properly.
- The distribution client failed to detect correctly when schema distribution was not ready.
- The distribution client, when killed, exited without marking the current schema operation as failed.
- An operation in NDB_SHARE could be accessed without the proper locks being in place.
In addition, usage of the ndb_schema_share global pointer was removed, and replaced with detecting whether the schema distribution is ready by checking whether an operation for mysql.ndb_schema has been created in NDB_SHARE. (Bug #29639381)
With DataMemory set to 200 GB, ndbmtd failed to start. (Bug #29630367)
When a backup fails due to ABORT_BACKUP_ORD being received while waiting for buffer space, the backup calls closeScan() and then sends a SCAN_FRAGREQ signal to the DBLQH block to close the scan. As part of receiving SCAN_FRAGCONF in response, scanConf() is called on the operation object for the file record which in turn calls updateWritePtr() on the file system buffer (FsBuffer). At this point the length sent by updateWritePtr() should be 0, but in this case was not, which meant that the buffer did not have enough space even though it did not, the problem being that the size is calculated as scanStop - scanStart and these values were held over since the previous SCAN_FRAGCONF was received, and were not reset due to being out of buffer space.
To avoid this problem, we now set scanStart = scanStop in confirmBufferData() (formerly scanConfExtra()) which is called as part of processing the SCAN_FRAGCONF, indirectly by scanConf() for the backup and first local checkpoint files, and directly for the LCP files which use only the operation record for the data buffer. (Bug #29601253)
The setting for MaxDMLOperationsPerTransaction was not validated in a timely fashion, leading to data node failure rather than a management server error in the event that its value exceeded that of MaxNoOfConcurrentOperations. (Bug #29549572)
Data nodes could fail due to an assert in the DBTC block under certain circumstances in resource-constrained environments. (Bug #29528188)
An upgrade to NDB 7.6.9 or later from an earlier version could not be completed successfully if the redo log was filled to more than 25% of capacity. (Bug #29506844)
When the DBSPJ block called the internal function lookup_resume() to schedule a previously enqueued operation, it used a correlation ID which could have been produced from its immediate ancestor in the execution order, and not its parent in the query tree as assumed. This could happen during execution of a SELECT STRAIGHT_JOIN query.
Now NDB checks whether the execution ancestor is different from the query tree parent, and if not, performs a lookup of the query tree parent, and the parent's correlation ID is enqueued to be executed later. (Bug #29501263)
When a new master took over, sending a MASTER_LCP_REQ signal and executing MASTER_LCPCONF from participating nodes, it expected that they had not completed the current local checkpoint under the previous master, which need not be true. (Bug #29487340, Bug #29601546)
When restoring TINYBLOB columns, ndb_restore now treats them as having the BINARY character set. (Bug #29486538)
When selecting a sorted result set from a query that included a LIMIT clause on a single table, and where the sort was executed as Using filesort and the ref access method was used on an ordered index, it was possible for the result set to be missing one or more rows. (Bug #29474188)
Restoration of epochs by ndb_restore failed due to temporary redo errors. Now ndb_restore retries epoch updates when such errors occur. (Bug #29466089)
ndb_restore tried to extract an 8-character substring of a table name when checking to determine whether or not the table was a blob table, regardless of the length of the name. (Bug #29465794)
When a pushed join was used in combination with the eq_ref access method it was possible to obtain an incorrect join result due to the 1 row cache mechanism implemented in NDB 8.0.16 as part of the work done in that version to extend NDB condition pushdown by allowing referring values from previous tables. This issue is now fixed by turning off this caching mechanism and reading the row directly from the handler instead, when there is a pushed condition defined on the table. (Bug #29460314)
Improved and made more efficient the conversion of rows by the ha_ndbcluster handler from the format used internally by NDB to that used by the MySQL server for columns that contain neither BLOB nor BIT values, which is the most common case. (Bug #29435461)
A failed DROP TABLE could be attempted an infinite number of times in the event of a temporary error. Now in such cases, the number of retries is limited to 100. (Bug #29355155)
ndb_restore --restore-epoch incorrectly reported the stop GCP as 1 less than the actual position. (Bug #29343655)
A SavedEvent object in the CMVMI kernel block is written into a circular buffer. Such an object is split in two when wrapping at the end of the buffer; NDB looked beyond the end of the buffer instead of in the wrapped data at the buffer's beginning. (Bug #29336793)
NDB did not compile with -DWITH_SYSTEM_LIBS=ON due to an incorrectly configured dependency on zlib. (Bug #29304517)
Removed a memory leak found when running ndb_mgmd --config-file after compiling NDB with Clang 7. (Bug #29284643)
Removed clang compiler warnings caused by usage of extra ; characters outside functions; these are incompatible with C++98. (Bug #29227925)
Adding a column defined as TIMESTAMP DEFAULT CURRENT_TIMESTAMP to an NDB table is not supported with ALGORITHM=INPLACE. Attempting to do so now causes an error. (Bug #28128849)
Added support which was missing in ndb_restore for conversions between the following sets of types:
- BLOB and BINARY or VARBINARY columns
- TEXT and BLOB columns
- BLOB columns with unequal lengths
- BINARY and VARBINARY columns with unequal lengths
(Bug #28074988)
Neither the MAX_EXECUTION_TIME optimizer hint nor the max_execution_time system variable was respected for DDL statements or queries against INFORMATION_SCHEMA tables while an NDB global schema lock was in effect. (Bug #27538139)
DDL operations were not always performed correctly on database objects including databases and tables, when multi-byte character sets were used for the names of either or both of these. (Bug #27150334)
ndb_import did not always free up all resources used before exiting. (Bug #27130143)
NDBCLUSTER subscription log printouts provided only 2 words of the bitmap (in most cases containing 8 words), which made it difficult to diagnose schema distribution issues. (Bug #22180480)
For certain tables with very large rows and a very large primary key, START BACKUP SNAPSHOTEND while performing inserts into one of these tables or START BACKUP SNAPSHOTSTART with concurrent deletes could lead to data node errors.
As part of this fix, ndb_print_backup_file can now read backup files created in very old versions of NDB Cluster (6.3 and earlier); in addition, this utility can now also read undo log files. (Bug #94654, Bug #29485977)
When one of multiple SQL nodes which were connected to the cluster was down and then rejoined the cluster, or a new SQL node joined the cluster, this node did not use the data dictionary correctly, and thus did not always add, alter, or drop databases properly when synchronizing with the existing SQL nodes.
Now, during schema distribution at startup, the SQL node compares all databases on the data nodes with those in its own data dictionary. If any database on the data nodes is found to be missing from the SQL node's data dictionary, the SQL Node installs it locally using CREATE DATABASE; the database is created using the default MySQL Server database properties currently in effect on this SQL node. (WL #12731)