Changes in 24.3.9

The following changes were made in Oracle NoSQL Database Release 24.3.8 Enterprise Edition.

New Features

  1. Added support of elasticity operations in the presence of multiple subscribers to the Streams API. This means that writes to the same key will be streamed and delivered to the application in the correct order, even if data was moved as a result of an elasticity operation.

    With the new feature, the checkpoint table mapper is required for multiple sharded subscribers, and an exception will be thrown if the stream is configured without a mapper.

    [KVSTORE-1153]
  2. Added support for cross-region elasticity operations with multi-region tables. This means that multi-region tables will behave correctly while one or more stores is undergoing an elasticity operation (expansion or contraction). When a user configures a group of service agents connecting different regions, multi-region tables now support elasticity operations in any region. As a result, data in a multi-region table will be eventually consistent in the presence of elasticity operations.

    [KVSTORE-2151]

  3. Implemented new Timestamp SQL functions: timestamp_ceil, timestamp_floor, timestamp_round, timestamp_trunc, timestamp_bucket, format_timestamp, parse_to_timestamp quarter, day_of_week, day_of_month, day_of_year, to_last_day_of_month.

    [KVSTORE-1157]

  4. Added support for SSL credentials login in the administrative shell command line. The SSL credentials must be from server-side security directory, which can be specified with -store-security-dir in the administrative CLI. Administrators who have access to the server security directory can log in to the store to reset the password when it has been forgotten.

    [KVSTORE-1672]

  5. Added new statistics for data verification.

    New statistics are:

    • nDbVerifyRuns: Number of times that DbVerify has been called.
    • DbVerifyRunTime: Total run time of DBVerify
    • nDbVerifyProblemsFound: Total number of problems that DbVerify found among all executions.

    [KVSTORE-582]

Bug and Performance Fixes

  1. Fixed a bug where if multi-region tables are deployed on three or more regions, the stream checkpoint table used by the cross-region service agent might be incorrectly dropped.

    [KVSTORE-2406]

  2. Fixed a bug that could cause a query with index-based order-by and IN predicate to sort its results in a wrong order. For example, the following query will return wrongly sorted results if there is an index on info.bar:
    select id, f.info.bar
    from foo f
    where f.info.bar in (6, 3, 7)
    order by f.info.bar

    First, all the results with info.bar = 6 will appear, then all the results with info.bar = 3, and finally all the results with info.bar = 7.

    [KVSTORE-2409]

  3. Fixed an issue where a select-and-modify query execution failed to make the expected modification because it could not find the specified rows. For example, a query delete might have failed to delete certain rows even though the rows were present. This issue was triggered in a very rare situation when a server-side master transfer raced with the query execution. There was no data corruption at the server side, and retrying the operation would return the correct answer unless the race was encountered again.

    [KVSTORE-2351]

  4. Fixed a bug where if a multi-region table is missing in the remote regions, the XRegionService might get stuck and be unable to serve future requests.

    [KVSTORE-2346]

  5. Fixed a bug with index metadata initialization. The problem can occur if index metadata (i.e. IndexImpl instances) is accessed by multiple threads in parallel while it is being initialized. The bug can cause the wrong query execution plan to be generated. Possibly it can cause other operations to fail as well. This is a continuation of kvstore-2185. This fix done for this bug in 24.1 was not quite correct.

    [KVSTORE-2185]

  6. Fixed a bug that when the stream underlying multi-region tables cannot resume from a checkpoint that is unavailable in remote region, rows in the multi-region tables that are deleted in the remote region might not be deleted locally.

    [KVSTORE-986]

  7. Fixed a bug where the Streams API might fail to recognize the following two security properties: (1) oracle.kv.ssl.trustStorePassword and (2) oracle.kv.ssl.trustStorePasswordAliasif storing the password in external password storage.

    [KVSTORE-2364]

  8. Fixed a bug where the multiDelete operation on a multi-region table with a child table might fail to delete ancestor keys.

    [KVSTORE-2224]

  9. Fixed an issue that can cause an OutOfMemoryError on replication nodes and storage nodes. When this failure occurs, a heap dump should reveal a huge number of objects with the prefix of NioEndpointHandler$InCompletionHandler. Most likely, a thread dump would also reveal that there is a blocking event (e.g., a DNS resolution or a synchronization attempt waiting) on the thread named KVNioEndpointGroup.backup_1.

    [KVSTORE-2260]

  10. Improve ping output to include details if the service is disabled. Ping text output now includes the (Stopped) status if the service is disabled by the user. The new ping output looks like the following:
    Ping command output: Pinging components of store kvstore based upon topology sequence #18
    10 partitions and 3 storage nodes
    Time: 2024-05-01 16:59:48 UTC   Version: 24.2.0
    Shard Status: healthy: 0 writable-degraded: 0 read-only: 0 offline: 1 total: 1
    Admin Status: writable-degraded
    Zone [name=Zone1 id=zn1 type=PRIMARY allowArbiters=true masterAffinity=false]   RN Status: online: 0 read-only: 0 offline: 2
    Storage Node [sn1] on localhost: 5001    Zone: [name=Zone1 id=zn1 type=PRIMARY allowArbiters=true masterAffinity=false]    Status: RUNNING   Ver: 24.2.0 2024-05-01 16:57:37 UTC  Build id: 6e05b77646a7 Edition: Enterprise    isMasterBalanced: true     serviceStartTime: 2024-05-01 16:59:33 UTC
                    Admin [admin1]                              Status: RUNNING,MASTER        serviceStartTime: 2024-05-01 16:59:36 UTC                stateChangeTime: 2024-05-01 16:59:36 UTC   availableStorageSize: 2 GB
                    Rep Node [rg1-rn1]        Status: UNREACHABLE (Stopped)
    Storage Node [sn2] on localhost: 5021    Zone: [name=Zone1 id=zn1 type=PRIMARY allowArbiters=true masterAffinity=false]    Status: RUNNING   Ver: 24.2.0 2024-05-01 16:57:37 UTC  Build id: 6e05b77646a7 Edition: Enterprise    isMasterBalanced: true     serviceStartTime: 2024-05-01 16:59:34 UTC
                    Admin [admin2]                              Status: RUNNING,REPLICA       serviceStartTime: 2024-05-01 16:59:39 UTC                stateChangeTime: 2024-05-01 16:59:39 UTC   availableStorageSize: 2 GB
                    Rep Node [rg1-rn2]        Status: UNREACHABLE (Stopped)
    Storage Node [sn3] on localhost: 5041    Zone: [name=Zone1 id=zn1 type=PRIMARY allowArbiters=true masterAffinity=false]    Status: RUNNING   Ver: 24.2.0 2024-05-01 16:57:37 UTC  Build id: 6e05b77646a7 Edition: Enterprise    isMasterBalanced: unknown          serviceStartTime: 2024-05-01 16:59:35 UTC
                    Admin [admin3]                              Status: UNREACHABLE (Stopped)         serviceStartTime: ?                stateChangeTime: ?      availableStorageSize: ?
                    Arb Node [rg1-an1]        Status: UNREACHABLE (Stopped)

    Ping JSON output now lists 'expectedStatus' as 'UNREACHABLE' if the service is disabled by the user. The new JSON output looks like following:

    {
      "topology": {
        ...
      },
      "adminStatus": "",
      "shardStatus": {
       ...
      },
      "zoneStatus": [
        ...
      ],
      "snStatus": [
        {
          ...,
          "rnStatus": [
            {
              "resourceId": "",
              "status": "",
              "expectedStatus": "UNREACHABLE"
            }
          ],
          "anStatus": []
        },
        {
          ...
          "rnStatus": [
            {
              "resourceId": "",
              "status": "",
              "expectedStatus": "UNREACHABLE"
            }
          ],
          "anStatus": []
        },
        {
         ...
          "rnStatus": [],
          "anStatus": [
            {
              "resourceId": "",
              "status": "",
              "expectedStatus": "UNREACHABLE"
            }
          ]
        }
    

    [KVSTORE-2072]

  11. Fixed an issue where queries executed concurrently with an elasticity operation involving the addition of new replication nodes may encounter an exception with a message that looks like the following:
    java.lang.IllegalStateException: oracle.kv.impl.query.QueryStateException: Unexpected state in query engine:
    Failed to read base topology with sequence number 1524
    Stack trace: java.lang.RuntimeException: oracle.kv.impl.query.QueryStateException: Unexpected state in query engine:
    Failed to read base topology with sequence number 1524, got null result

    [KVSTORE-2166]

  12. Fixed a bug that causes a RequestTimeoutException during store contraction. The RequestTimeoutException would have a stack trace that looks like the following, listing the target replication nodes being removed during the contraction.
    oracle.kv.RequestTimeoutException: Request dispatcher: c1609890242275412734, dispatch timed out after 0 attempt(s) to dispatch the request to an RN. Target: rg3-rn1 (24.1.0) on [2023-11-29 21:18:01.735 UTC]. Timeout: 19999ms
    Fault class name: oracle.kv.RequestTimeoutException
    Dispatch event trace: (DISPATCH_REQUEST_RESPONDED, 2023-11-29 21:18:01.424 UTC, rg3-rn1, null)(OBTAIN_REQUEST_HANDLER, 2023-11-29 21:18:01.552 UTC, rg3-rn3, null)(OBTAIN_REQUEST_HANDLER_RESPONDED, 2023-11-29 21:18:01.552 UTC, rg3-rn3, null)(DISPATCH_REQUEST_RESPONDED, 2023-11-29 21:18:01.552 UTC, rg3-rn3, null)(OBTAIN_REQUEST_HANDLER, 2023-11-29 21:18:01.552 UTC, rg3-rn1, null)(OBTAIN_REQUEST_HANDLER_RESPONDED, 2023-11-29 21:18:01.552 UTC, rg3-rn1, null)(DISPATCH_REQUEST_RESPONDED, 2023-11-29 21:18:01.552 UTC, rg3-rn1, null)(OBTAIN_REQUEST_HANDLER, 2023-11-29 21:18:01.552 UTC, rg3-rn2, null)(OBTAIN_REQUEST_HANDLER_RESPONDED, 2023-11-29 21:18:01.552 UTC, rg3-rn2, null)(DISPATCH_REQUEST_RESPONDED, 2023-11-29 21:18:01.552 UTC, rg3-rn2, null)(OBTAIN_REQUEST_HANDLER, 2023-11-29 21:18:01.681 UTC, rg3-rn3, null)(OBTAIN_REQUEST_HANDLER_RESPONDED, 2023-11-29 21:18:01.681 UTC, rg3-rn3, null)(DISPATCH_REQUEST_RESPONDED, 2023-11-29 21:18:01.681 UTC, rg3-rn3, null)(OBTAIN_REQUEST_HANDLER, 2023-11-29 21:18:01.681 UTC, rg3-rn1, null)(OBTAIN_REQUEST_HANDLER_RESPONDED, 2023-11-29 21:18:01.681 UTC, rg3-rn1, null)(DISPATCH_REQUEST_RESPONDED, 2023-11-29 21:18:01.681 UTC, rg3-rn1, null).
          at oracle.kv.impl.api.RequestDispatcherImpl.getTimeoutException(RequestDispatcherImpl.java:1270)
          at oracle.kv.impl.api.AsyncRequestDispatcherImpl$AsyncExecuteRequest.run(AsyncRequestDispatcherImpl.java:458)
          at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
          at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          at oracle.kv.impl.async.dialog.nio.NioChannelExecutor$WrappedFutureTask.run(NioChannelExecutor.java:1326)
          at oracle.kv.impl.async.dialog.nio.NioChannelExecutor$ScheduledFutureTask.run(NioChannelExecutor.java:1434)
          at oracle.kv.impl.async.dialog.nio.NioChannelExecutor.runTasks(NioChannelExecutor.java:2030)
          at oracle.kv.impl.async.dialog.nio.NioChannelExecutor.runOnce(NioChannelExecutor.java:1107)
          at oracle.kv.impl.async.dialog.nio.NioChannelExecutor.lambda$run$13(NioChannelExecutor.java:1060)
          at oracle.kv.impl.fault.AsyncEndpointGroupFaultHandler.lambda$static$0(AsyncEndpointGroupFaultHandler.java:21)
          at oracle.kv.impl.async.dialog.nio.NioChannelExecutor.run(NioChannelExecutor.java:1057)
          at java.base/java.lang.Thread.run(Thread.java:842)

    [KVSTORE-2175]

  13. Fixed an issue that could generate a misleading warning message in the logging output if an application had logging enabled. The message only occurred when a network connection handshake timed out. The message was related to an an internal issue that has been fixed and that had no impact on applications. The message was like the following:
    2024-02-27 22:40:39.488 UTC WARNING - Failed to complete future using exception oracle.kv.impl.async.exception.PersistentDialogException: Problem with channel (nosql-vm-test
    -e3-3:40687 (closed)): Connect timeout, handshake is not done within 3000 ms since the endpoint handler is created, endpointHandlerId=680d54f747e78e31, future was completed
    exceptionally: oracle.kv.RequestTimeoutException: Dialog timed out locally. Dialog context before abort: DialogContext[ dialogId=0:680d54f747e78e31 contextId=1 dialogType=40
    1 dialogHandler=AsyncVersionedRemoteDialogInitiator@d06ad71[methodCall=AsyncRequestHandler.ExecuteCall[NOP]] onCreatingEndpoint=true initTimeMillis=22:40:36.488 UTC latencyM
    s=-0.00 timeout=1024 state=INITED_NEED_DIALOGSTART writeState=0 abortInfo=null infoPerf=not sampled] (24.1.11) on [2024-02-27 22:40:37.512 UTC]. Timeout: 1024ms
    Fault class name: oracle.kv.RequestTimeoutException
    java.lang.Throwable
            at oracle.kv.impl.async.FutureUtils.checkedCompleteExceptionally(FutureUtils.java:278)
            at oracle.kv.impl.async.AsyncVersionedRemoteDialogInitiator.onAbort(AsyncVersionedRemoteDialogInitiator.java:229)
            at oracle.kv.impl.async.NullDialogStart.doFail(NullDialogStart.java:104)
            at oracle.kv.impl.async.NullDialogStart.fail(NullDialogStart.java:74)
            at oracle.kv.impl.async.dialog.AbstractDialogEndpointHandler.abortDialogs(AbstractDialogEndpointHandler.java:2135)
            at oracle.kv.impl.async.dialog.AbstractDialogEndpointHandler.terminate(AbstractDialogEndpointHandler.java:1354)
            at oracle.kv.impl.async.dialog.AbstractDialogEndpointHandler$ConnectTimeoutTask.run(AbstractDialogEndpointHandler.java:2437)
            at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
            at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
            at oracle.kv.impl.async.dialog.nio.NioChannelExecutor$WrappedFutureTask.run(NioChannelExecutor.java:1326)
            at oracle.kv.impl.async.dialog.nio.NioChannelExecutor$ScheduledFutureTask.run(NioChannelExecutor.java:1434)
            at oracle.kv.impl.async.dialog.nio.NioChannelExecutor.runTasks(NioChannelExecutor.java:2030)
            at oracle.kv.impl.async.dialog.nio.NioChannelExecutor.runOnce(NioChannelExecutor.java:1107)
            at oracle.kv.impl.async.dialog.nio.NioChannelExecutor.lambda$run$13(NioChannelExecutor.java:1060)
            at oracle.kv.impl.fault.AsyncEndpointGroupFaultHandler.lambda$static$0(AsyncEndpointGroupFaultHandler.java:21)
            at oracle.kv.impl.async.dialog.nio.NioChannelExecutor.run(NioChannelExecutor.java:1057)
            at java.base/java.lang.Thread.run(Thread.java:1583)

    [KVSTORE-2228]

  14. Fixed an issue in the admin command 'plan verify-data' where verification errors were not reported when the command was run without -json flag. Additionally, if the verify-data plan reports btree/log file corruption then the plan will fail. This fix introduces two new error messages:
    1. NOSQL_5600: RN Btree corruption only
    2. NOSQL_5601: RN Logfile corruption or RN Logfile/Btree corruption
    [KVSTORE-1747]
  15. Fixed an issue where the command history feature of the shell command line did not work properly if the shell was started with the user password specified manually.

    [KVSTORE-2259]

  16. Fixed an issue where clients were unable to connect to the store using the java API call KVStoreFactory.getStore() when client helper hosts were storage nodes that hosted only admin nodes, not replication nodes or arbiter nodes. For example, obtaining the store handle using an admin-only SN running on localhost:5000 for a non-secure store would lead to the following exception:
    oracle.kv.FaultException: Could not contact any RepNode at: [localhost:5000] (24.1.11) on [2024-07-09 07:00:38.754 UTC]
    
    And in the case of a secure store:
    oracle.kv.FaultException: Could not establish an initial login from: [localhost:5000] (24.1.11) on [2024-07-09 07:01:06.930 UTC]
    

    [KVSTORE-1695]

  17. Fixed an issue where the parameter javaRnParamsOverride is used to override Replication Node JVM arguments. The user can set the parameter and run a "change-parameters" plan to set the value in the store. In some cases, the store did not process the javaRnParamsOverride correctly and this resulted in an unrecoverable error when a replication node started. The error looked like this:
    ProcessMonitor: Unexpected exception in MonitorThread: java.lang.NumberFormatException: For input string: "11253m-xx:parallel"java.lang.NumberFormatException: For input string: "11253m-xx:parallel"
    at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)

    [KVSTORE-2257]

  18. Node restart will no longer fail if a persistent error, such as a checksum error, is discovered. This was done to increase node availability.

    [KVSTORE-2317]

API Changes

  1. Removed various deprecated JE configuration parameters. If one of these deprecated configurations is found in the je.properties file, it will be ignored and a warning will be printed in the logs.

    Removed configurations are:

    • je.evictor.wakeupInterval
    • je.evictor.useMemoryFloor
    • je.evictor.nodeScanPercentage
    • je.evictor.evictionBatchPercentage
    • je.maxOffHeapMemory
    • je.offHeap.evictByte
    • je.offHeap.checksum
    • je.env.runOffHeapEvictor
    • je.offHeap.coreThreads
    • je.offHeap.maxThreads
    • je.offHeap.keepAlive
    • je.checkpointer.wakeupInterval
    • je.cleaner.minFilesToDelete
    • je.cleaner.retries
    • je.cleaner.restartRetries
    • je.cleaner.calc.recentLNSizes
    • je.cleaner.calc.minUncountedLNs
    • je.cleaner.calc.initialAdjustments
    • je.cleaner.calc.minProbeSkipFiles
    • je.cleaner.calc.maxProbeSkipFiles
    • je.cleaner.cluster
    • je.cleaner.clusterAll
    • je.cleaner.rmwFix
    • je.txn.serializableIsolation
    • je.lock.oldLockExceptions
    • je.log.groupCommitInterval
    • je.log.groupCommitThreshold
    • je.log.useNIO
    • je.log.directNIO
    • je.log.chunkedNIO
    • je.nodeDupTreeMaxEntries
    • je.tree.maxDelta
    • je.compressor.purgeRoot
    • je.evictor.deadlockRetry
    • je.cleaner.adjustUtilization
    • je.cleaner.maxBatchFiles
    • je.cleaner.foregroundProactiveMigration
    • je.cleaner.backgroundProactiveMigration
    • je.cleaner.lazyMigration
    • je.rep.preserveRecordVersion
    • je.rep.minRetainedVLSN
    • je.rep.repStreamTimeout
    • je.rep.replayCostPercent
    • java.util.logging.FileHandler.on
    • java.util.logging.ConsoleHandler.on
    • java.util.logging.DbLogHandler.on
    • java.util.logging.level.lockMgr
    • java.util.logging.level.recovery
    • java.util.logging.level.evictor
    • java.util.logging.level.cleaner

    [KVSTORE-1788]