Changes in 24.4.9

The following changes were made in Oracle NoSQL Database Release 24.4.9 Enterprise Edition.

Topics

Bug and Performance Fixes

Bug and Performance Fixes

Made improvements for managing the TLS credentials files for a store.
Two new commands were added to the Admin CLI for managing new and current TLS credentials:
- The plan update-tls-credentials command can be used to retrieve and install updates to the set of shared TLS credentials used by SNAs in the store.
- The show tls-credentials command displays information for all SNAs about the TLS credentials currently installed as well as any updates waiting to be installed.Also made modifications to allow store services to read updated credentials without requiring a restart.
  [KVSTORE-1979]

Fixed a bug that prevented a network restore from happening when reopening an environment after it crashed in the middle of a prior network restore.

This bug will show up as repeated LOG_FILE_NOT_FOUND or LOG_INTEGRITY errors every time the node attempts to restart, such as:

SEVERE com.sleepycat.je.EnvironmentFailureException: fetchIN 0x1225/0x313b4d9c parent IN=347606 IN class=com.sleepycat.je.tree.IN lastFullLsn=0x1226/0x396fe2af lastLoggedLsn=0x1226/0x396fe2af parent.getDirty()=false idKey=[194 133 0 116 100 97 56 52 112 100 114 102 54 116 100 97 56 52 ] state=0 expires=never LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.
	at com.sleepycat.je.tree.IN.handleCorruption(IN.java:2940)
	at com.sleepycat.je.tree.IN.handleCleanedIN(IN.java:3065)
	at com.sleepycat.je.tree.IN.fetchINWithNoLatch(IN.java:2411)
	at com.sleepycat.je.tree.IN.fetchINWithNoLatch(IN.java:2234)
	at com.sleepycat.je.tree.Tree.getParentINForChildIN(Tree.java:1301)
	at com.sleepycat.je.recovery.RecoveryManager.recoverChildIN(RecoveryManager.java:1554)
	at com.sleepycat.je.recovery.RecoveryManager.recoverIN(RecoveryManager.java:1379)
	at com.sleepycat.je.recovery.RecoveryManager.replayOneIN(RecoveryManager.java:1333)
	at com.sleepycat.je.recovery.RecoveryManager.readNonRootINs(RecoveryManager.java:1296)
	at com.sleepycat.je.recovery.RecoveryManager.buildINs(RecoveryManager.java:1043)
	at com.sleepycat.je.recovery.RecoveryManager.buildTree(RecoveryManager.java:899)
	at com.sleepycat.je.recovery.RecoveryManager.recover(RecoveryManager.java:376)
	at com.sleepycat.je.dbi.EnvironmentImpl.finishInit(EnvironmentImpl.java:861)
	at com.sleepycat.je.dbi.DbEnvPool.openEnv(DbEnvPool.java:153)
	at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:284)
	at com.sleepycat.je.Environment.(Environment.java:257)
	at com.sleepycat.je.rep.ReplicatedEnvironment.(ReplicatedEnvironment.java:606)
	at com.sleepycat.je.rep.ReplicatedEnvironment.(ReplicatedEnvironment.java:457)
	at oracle.kv.impl.rep.RepEnvHandleManager.openEnv(RepEnvHandleManager.java:918)
	at oracle.kv.impl.rep.RepEnvHandleManager.renewRepEnv(RepEnvHandleManager.java:743)
	at oracle.kv.impl.rep.RepNode.startup(RepNode.java:1003)
	at oracle.kv.impl.rep.RepNodeService.start(RepNodeService.java:680)
	at oracle.kv.impl.rep.RepNodeService.start(RepNodeService.java:622)
	at oracle.kv.impl.sna.ManagedRepNode$1.execute(ManagedRepNode.java:327)
	at oracle.kv.impl.fault.ProcessFaultHandler.execute(ProcessFaultHandler.java:179)
	at oracle.kv.impl.sna.ManagedRepNode.start(ManagedRepNode.java:323)
	at oracle.kv.impl.sna.ManagedService.main(ManagedService.java:785)

[KVSTORE-2494]

Fixed an issue that the message with the following stack may be logged by the client driver but is in fact a false alarm:

2024-10-13 21:59:32.767 UTC SEVERE - Uncaught exception in thread:KVoracle.kv.impl.async.dialog.nio.NioChannelThreadPool_2 at: java.base/java.lang.Thread.getStackTrace(Thread.java:2514)
        oracle.kv.impl.util.CommonLoggerUtils.getStackTrace(CommonLoggerUtils.java:74)
        oracle.kv.impl.util.CommonLoggerUtils.getStackTrace(CommonLoggerUtils.java:60)
        oracle.kv.KVStoreFactory$KVSHandler.uncaughtException(KVStoreFactory.java:383)
        oracle.kv.impl.api.parallelscan.BaseParallelScanIteratorImpl$Stream.lambda$0(BaseParallelScanIteratorImpl.java:952)
        ...

[KVSTORE-2268]

Fixed a bug that when a transaction to the subscribed tables is aborted, the current stream position returned from NoSQLSubscription#getCurrentPosition() may be blocked on a certain shard.
[KVSTORE-2452]
Fixed a bug on indexes created from only the key part of multi-region tables. Deletes and inserts on the multi-region table could result in either a NullPointerException or a SecondaryIntegrityException that would require the index to be dropped and re-created.
[KVSTORE-2485]
Fixed a bug where if multi-region tables are deployed on three or more regions, the stream checkpoint table used by the cross-region service agent might be incorrectly dropped.
[KVSTORE-2406]

Fixed a bug where the Admin process may fail repeatedly and cannot recover by restarting. The issue happens more often if tables are frequently created and deleted. This failure generates a stack trace similar to the following in the Admin log:

2024-08-19 22:39:01.127 UTC SEVERE [admin2] Process exiting
com.sleepycat.je.ThreadInterruptedException: (JE 24.3.6) 2(2):... java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
	at com.sleepycat.je.ThreadInterruptedException.wrapSelf(ThreadInterruptedException.java:107)
	at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1773)
	at com.sleepycat.je.dbi.EnvironmentImpl.checkOpen(EnvironmentImpl.java:1782)
	at com.sleepycat.je.Environment.checkOpen(Environment.java:2395)
	at com.sleepycat.je.DbInternal.checkOpen(DbInternal.java:111)
	at com.sleepycat.je.rep.ReplicatedEnvironment.checkOpen(ReplicatedEnvironment.java:1039)
	at com.sleepycat.je.rep.ReplicatedEnvironment.getState(ReplicatedEnvironment.java:751)
	at oracle.kv.impl.admin.Admin.getReplicationMode(Admin.java:3251)
    ...
Caused by: com.sleepycat.je.ThreadInterruptedException: (JE 24.3.6) 2(2):... java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
    ...
	at oracle.kv.impl.admin.AdminDatabase.get(AdminDatabase.java:240)
	at oracle.kv.impl.admin.GeneralStore.getParameters(GeneralStore.java:59)
	at oracle.kv.impl.admin.AdminStores.getParameters(AdminStores.java:179)
    ...
	at oracle.kv.impl.admin.Admin.getCurrentParameters(Admin.java:1240)
	at oracle.kv.impl.admin.AdminSecurity$AdminParamsHandle.getParameters(AdminSecurity.java:216)
	at oracle.kv.impl.security.login.ParamTopoResolver.getStorageNode(ParamTopoResolver.java:155)
	at oracle.kv.impl.security.login.ParamTopoResolver.getStorageNode(ParamTopoResolver.java:84)
	at oracle.kv.impl.security.login.InternalLoginManager.getHandle(InternalLoginManager.java:132)
	at oracle.kv.impl.api.AsyncRequestDispatcherImpl.withLoginHandle(AsyncRequestDispatcherImpl.java:695)
    ...
Caused by: java.lang.InterruptedException
	at java.base/java.util.concurrent.locks.ReentrantLock$Sync.tryLockNanos(ReentrantLock.java:167)
    ...
	at com.sleepycat.je.Database.get(Database.java:1261)
	at oracle.kv.impl.admin.AdminDatabase.get(AdminDatabase.java:240)
	at oracle.kv.impl.admin.GeneralStore.getParameters(GeneralStore.java:59)
	at oracle.kv.impl.admin.AdminStores.getParameters(AdminStores.java:179)
    ...

[KVSTORE-2425]

Fixed a bug where if a query has a UUID field with a variable in the 'where' clause, the UUID field is one of the primary key fields, and the full shard key is provided in the 'where' clause, then preparing the query may throw an IllegalArgumentException with the message "Invalid UUID string".
[KVSTORE-2451]
Handle the cases where a prepared query is executed after (a) one or more of the referenced tables has been dropped and then re-created with a different schema, or (b) the index used by the query has been dropped and then re-created with a different schema, or (c) one or more of the referenced tables has been altered (via the alter table statement).
In case (a) the query will throw an IllegalArgumentException. In cases (b) and (c) the query will throw a PrepareQueryException indicating that the query must be prepared again before it can be executed.

[KVSTORE-2475]
Fixed a query bug that would cause a query with 'order by' to return the results in the wrong order. The bug occurs when (a) the query has nested tables with descendant tables (or left outer joins), (b) the query specifies a complete primary key for the target table, (c) the query orders by fields in a descendant table, and (d) the ordering is descending. For example, the following query, where column 'ida' is the primary key of table A, returned its results in ascending order, instead of the requested descending order:
```
select a.ida as a_ida, b.ida as b_ida, b.idb as b_idb from nested tables(A a descendants(A.B
      b)) where a.ida = 40 order by b.idb desc
```
[KVSTORE-2462]

Fixed a bug where client requests may fail due to SecondaryIntegrityException with a stack trace like the following:

Caused by: com.sleepycat.je.SecondaryIntegrityException: (JE 24.3.4) Secondary is corrupt: the primary record contains a key that is not present in the secondary secDbName=IdxIndex.DataCheck priDbName=p173 expiration=2026-10-28.00 NOT_EXTINCT priLsn=0x4/0xaa1fe8a
    at com.sleepycat.je.SecondaryDatabase.deleteKey(SecondaryDatabase.java:1526)
    at com.sleepycat.je.SecondaryDatabase.updateSecondary(SecondaryDatabase.java:1347)
    at com.sleepycat.je.Cursor.putNotify(Cursor.java:2826)
    at com.sleepycat.je.Cursor.putNoDups(Cursor.java:2640)
    at com.sleepycat.je.Cursor.putInternal(Cursor.java:2538)
    at com.sleepycat.je.Cursor.putInternal(Cursor.java:846)
    at com.sleepycat.je.Database.put(Database.java:1435)
    at oracle.kv.impl.rep.migration.MigrationTarget$Reader$CopyOp.execute(MigrationTarget.java:1545)
    at oracle.kv.impl.rep.migration.MigrationTarget.consumeOps(MigrationTarget.java:998)

The issue only happens when a concurrent elasticity operation on the server side fails and in turn causes a secondary integrity issue. One likely trigger event of such failure is disk IO error which may yield server-side logging messages like the following:

java.io.IOException: No space left on device LOG_WRITE: IOException on write,
log is likely incomplete. Environment is invalid and must be closed.

The issue may cause multiple client operations to fail. The store may recover itself, with the elasticity operation resume on healthy nodes and the secondary index rebuilt.

[KVSTORE-2372]

Fixed a problem that could cause the Admin CLI ping or verify configuration commands to report incorrect status for Rep Nodes or Arbiters in rare cases when used in store configurations with storage nodes that host multiple services of the same type.
[KVSTORE-2383]