Known Issues

Topics

Long Sync-ups Can Produce RN Failures

In some cases, the sync-up process that replication nodes use to replicate data from the master node to the replica can take an unexpectedly long time to find the initial sync-up point. If the amount of time exceeds the default 5 second timeout, the replica will restart and attempt the sync-up again. If the search for the sync-up point continues to take longer than 5 seconds, this process will prevent the replica from rejoining the shard.

To determine if this problem is occurring, first use the ping command to check whether shards are healthy. If a shard continues to be unhealthy, check the RN logs for messages like the following:

2022-01-12 18:35:11.087 UTC INFO [rg13-rn2] JE: Inactive channel: NullNode(-1) forced close. Timeout: 5000ms.

To work around this problem, you should increase the timeout value to be long enough to allow the sync-up to complete successfully. Do this by modifying the configProperties RN parameter for all RNs to include a setting for the je.rep.preHeartbeatTimeoutMs value to specify the longer timeout. Note that this value specifies a time in microseconds, not milliseconds. If you already have a non-empty value for the configProperties parameter, make sure to add the new value to the existing values, separated by a semicolon. For example, to increase the timeout to 60 seconds, assuming that the configProperties parameter was previously empty, run the following Admin CLI command for each RN. For example, for RN rg1-rn1:

kv-> plan change-parameters -service rg1-rn1 -wait -params "configProperties=je.rep.preHeartbeatTimeoutMs 60000000"

[KVSTORE-2096]

Performance Issues with Virtual Threads

The Direct Java Driver API needs to be updated to optimize its performance if the API methods are called from the virtual threads introduced in Java 21. These performance issues will be addressed in a future release.

[KVSTORE-2121]

Topology Changes May Fail During Software Upgrades

Making modifications to the store topology that include partition migration may fail if the modifications are performed while the store is being upgraded to a new software version. If you run a plan to deploy a new topology and the plan fails with problems during partition migration, check if the nodes of the store are running different software versions, and upgrade any nodes running old versions before retrying the plan.

Modifying a topology using one of the following topology commands can result in the need for partition migration. Deploying the resulting topology with the 'plan deploy-topology' command can then fail if the plan is performed during a store software version upgrade. The topology commands that can produce partition migrations are:

topology change-repfactor
topology contract
topology rebalance
topology redistribute

Other topology commands do not produce partition migration and do not cause this problem.

If a topology deployment fails, you can tell if it is related to partition migrations during a software version upgrade by looking for errors like the following:

Plan 24 ended with errors. Use "show plan -id 24" for more information
Plan Deploy Topo
Id:                    24
State:                 ERROR
Attempt number:        1
Started:               2020-04-10 15:19:59 UTC
Ended:                 2020-04-10 15:24:48 UTC
Plan failures:
	Failure 1: 17/MigratePartition PARTITION-2 from rg1 to rg2
	failed. target=rg2-rn1 state=ERROR java.lang.Exception:
	Migration of PARTITION-2 failed. Giving up after 10 attempt(s)

If you see a plan failure involving partition migrations like this, particularly if there are similar failures for all partition migration tasks, use the 'ping' or 'verify topology' commands to display information about the store and check to see if different storage nodes are running different major or minor software versions. If so, upgrade the nodes running the older software to the latest version before retrying the 'plan deploy-topology' command.

Enterprise Manager plug-in not compatible with EM 13.4.0.0 and later

Oracle NoSQL's Enterprise Manager (EM) plug-in is compatible with EM versions up to and including EM version 13.3.0.0. Because of architectural changes in EM's plug-in support, the plugin is not compatible with EM version 13.4.0.0 and subsequent versions.

[KVSTORE-141]

Limitations on Multi-Region Tables in This Release

The Multi-Region Tables feature in this release has the following limitations:

Specifying a non-zero TTL when inserting or updating a row in a Multi-Region table is only supported after upgrading the driver, and may fail until the local store has been completely upgraded. In addition, TTL expiration times will be lost when rows are replicated to a remote region if the multi-region agent or store for that region have not been upgraded. [#28165]
The import, export, and snapshot commands should not be used to restore multi-region tables. The commands do not currently account for region information or modification times, so using these commands to restore a multi-region table to the contents from an earlier time may produce inconsistent results.
[KVSTORE-444]

Updating Java Memory Settings after Release 18.1 Workaround

Starting with release 18.3, the Java heap overhead is explicitly accounted for via the new Storage Node parameter named jvmOverheadPercent, with a default value of 25%. If you are running a store using a version earlier than 18.3, and the store was configured with the workarounds suggested in the Memory Allocation Algorithm Fails to Account for Java Memory Overhead Can Produce OutOfMemoryErrors section of the 18.1 release notes, then you should make the following changes during the upgrade to an 18.3 or later release. The changes to make depends on whether you followed the first or second set workarounds, based on whether your configuration has more than 48 GiB of memory per RN.

If you used the first set of instructions in the release notes because your configuration no more than 48 GiB of memory per RN, then immediately before upgrading the store to release 18.3 or a later release, run the following Admin CLI commands:

```
change-policy -params rnHeapPercent=68
```

For each storage node, replacing snX as appropriate:

plan change-parameters -service snX -wait -params rnHeapPercent=68

After the upgrade, run the following Admin CLI command for each storage node, replacing snX as appropriate:
```
plan change-parameters -service snX -wait -params memoryMB=0
```

You are done.

If you used the second set of instructions in the release notes because your configuration has more than 48 GiB of memory per RN, then run the following Admin CLI commands after the upgrade:

```
change-policy -params systemPercent=10
```

For each storage node, replacing snX as appropriate:

plan change-parameters -service snX -wait -params systemPercent=10 memoryMB=0

You are done.

Note that making changes to multiple Storage Nodes to update Java memory settings may result in warnings in the debug logs regarding mismatched cache sizes such as:

2019-11-14 15:26:40.762 UTC WARNING - [rg1-rn3] JE: Mismatched cache sizes, feeder:516738252 replica: 375809638 feeder off-heap: 0 replica off-heap: 0

Once the changes are completed for all Storage Nodes, these warnings should not continue to be reported, and the temporary ones should be harmless.

[#27855]

Out-of-Order Processing During Streams API and Partition Migration

When an application uses the Streams API with a subscription that has multiple subscribers, and an elasticity operation is performed that involves a partition migration, the application may need to coordinate operations across subscribers. An elasticity change can cause the events being delivered for a given key to switch to a different subscriber. The Streams API delivers events in the proper order to the two subscribers, but it is up to the application to make sure that the subscribers perform actions for those events in the correct order. We hope to remove the need for this coordination in a future release.

[#27541]

IDENTITY Column Definition Missing in Export Package

The Import/Export utility does not export the IDENTITY column property for a table into the export package DDL file (tableSchema.ddl). This is a bug and will be fixed in a future release. The user will notice the missing IDENTITY column property only during an import into an existing table using the export package. Here are possible scenarios:

If the import table already exists and is non-empty, and the IDENTITY column is defined as GENERATED ALWAYS, the Oracle NoSQL Database will return an error saying that users cannot supply a value for GENERATED ALWAYS.
If the import table already exists and is non-empty, and the IDENTITY column is defined as GENERATED BY DEFAULT, the Import/Export utility will return an error saying that the record is already present. The user can choose to overwrite the records by setting the import config file option overwrite to true.
If the import table exists and is empty, and the IDENTITY column is defined as GENERATED ALWAYS, the Oracle NoSQL Database will return an error saying that users cannot supply a value for GENERATED ALWAYS.
If the import table exists and is empty, and the IDENTITY column defined as GENERATED BY DEFAULT, the import will succeed, taking the values from the export package. The user can then set the START WITH value to the next value in the sequence using the ALTER TABLE command.
If the import table does not exist, then import will create the table using the DDL in the export package that had the missing IDENTITY column property, thus losing knowledge of the original IDENTITY column. This problem will be fixed in a future release. The import will succeed as per the semantics of a table without an IDENTITY column.

For all of these options, you can add or modify the IDENTITY column property using the ALTER TABLE command. See IDENTITY column documentation for more details.

[#27562]

Export Hangs When Disk is Full at Sink

During an export, the Import/Export tool will hang if the sink runs out of disk space. This issue will be fixed in a future release. Users must restart the export after freeing up disk space at sink. The user will see a java.io.IOException: No space left on device if they had started export in -verbose mode.

java -jar /home/jinzha/mywork/kv/lib/kvtool.jar export -helper-hosts 192.168.56.1:5000 \
-store kvstore -export-all -config /home/jinzha/mywork/export.cfg -verbose
Enter command: export
2019-04-22 23:55:16.316 UTC Start migration with configuration:
{
  "configFileVersion" : 1,
  "abortOnError" : true,
  "source" : {
    "type" : "nosqldb",
    "helperHosts" : [ "192.168.56.1:5000" ],
    "storeName" : "kvstore"
  },
  "sink" : {
    "type" : "file",
    "format" : "binary",
    "path" : "/home/jinzha/mywork/data"
  }
}
2019-04-22 23:55:16.338 UTC TaskWaiter thread spawned.
2019-04-22 23:55:16.693 UTC Exporting table schema: users. TableVersion: 1
2019-04-22 23:55:16.695 UTC Creating a new RecordStream for SchemaDefinition. File segment number: 1. Chunk sequence: abcdefghijlk
2019-04-22 23:55:16.701 UTC WriteTask worker thread spawned for SchemaDefinition
2019-04-22 23:55:16.704 UTC [binary]: Exported 1 record from tableSchema: 0min 0sec 361ms
2019-04-22 23:55:16.729 UTC Exporting store data with configuration: consistency=null; requestTimeout=0ms
2019-04-22 23:55:16.773 UTC Creating a new RecordStream for users. File segment number: 1. Chunk sequence: abcdefghijlk
2019-04-22 23:55:16.788 UTC WriteTask worker thread spawned for users
2019-04-22 23:55:18.954 UTC Exception exporting users. Chunk sequence: abcdefghijlk
java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:326)
    at oracle.kv.util.expimp.utils.exp.LocalStoreOutput.exportDataStream(LocalStoreOutput.java:211)
    at oracle.kv.util.expimp.utils.exp.LocalStoreOutput.doExport(LocalStoreOutput.java:149)
    at oracle.kv.util.expimp.utils.exp.AbstractStoreOutput$WriteTask.call(AbstractStoreOutput.java:639)
    at oracle.kv.util.expimp.utils.exp.AbstractStoreOutput$WriteTask.call(AbstractStoreOutput.java:620)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2019-04-22 23:56:06.705 UTC Exception exporting SchemaDefinition. Chunk sequence: abcdefghijlk
java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:326)
    at oracle.kv.util.expimp.utils.exp.LocalStoreOutput.exportDataStream(LocalStoreOutput.java:211)
    at oracle.kv.util.expimp.utils.exp.LocalStoreOutput.doExport(LocalStoreOutput.java:149)
    at oracle.kv.util.expimp.utils.exp.AbstractStoreOutput$WriteTask.call(AbstractStoreOutput.java:639)
    at oracle.kv.util.expimp.utils.exp.AbstractStoreOutput$WriteTask.call(AbstractStoreOutput.java:620)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2019-04-22 23:56:16.708 UTC [binary]: Writing continue.., wait 1 minutes

[#27574]

Need a Minimum of 5 GB of Free Disk Space to Deploy a Storage Node That Hosts an Admin

If a Storage Node that hosts an admin is deployed on a system with less than 5 GB of free disk space, the following exception will occur:

Connected to Admin in read-only mode
(JE 18.1.8) Database AdminSchemaVersion not found. (18.1.3)

Make sure you have at least 5 GB of free disk space to successfully deploy a storage node. This same problem will occur when deploying KVLite. We expect to remove this restriction in a future release.

[#26818]

Users Must Manage Admin Directory Size, Can Put All Admins Into "RUNNING,UNKNOWN" State

Every Admin is allocated a maximum of 3 GB of disk space by default, which is sufficient space for the vast majority of installations. However, under some rare circumstances you might want to change this 3 GB limit, especially if the Admin is sharing a disk with a Storage Node. For more information, see Managing Admin Directory Size.

If Admins run out of disk space, then there will be entries in the Admin logs saying "Disk usage is not within je.maxDisk or je.freeDisk limits and write operations are prohibited" and the output of the ping command will show all the Admins in the "RUNNING,UNKNOWN" state. Follow the procedure described in Managing Admin Directory Size to bring the Admins back to the "RUNNING,MASTER" or "RUNNING,REPLICA" state.

Below is sample output of the ping command and log entries that indicate that Admin ran out of disk space.

kv-> ping
Connected to Admin in read-only mode
Pinging components of store kvstore based upon topology sequence #106
90 partitions and 3 storage nodes
Time: 2018-04-03 08:20:22 UTC   Version: 18.3.0
Shard Status: healthy:3 writable-degraded:0 read-only:0 offline:0 total:3
Admin Status: read-only
Zone [name=Houston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
    RN Status: online:9 offline:0 maxDelayMillis:0 maxCatchupTimeSecs:0
Storage Node [sn1] on localhost:10000
    Zone: [name=Houston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
    Status: RUNNING   Ver: 18.3.0 2018-04-03 05:36:25 UTC  Build id: ec627ef967d6 Edition: Enterprise
        Admin [admin1]          Status: RUNNING,UNKNOWN
        Rep Node [rg1-rn1]      Status: RUNNING,REPLICA sequenceNumber:93 haPort:10011 delayMillis:0 catchupTimeSecs:0
        Rep Node [rg2-rn1]      Status: RUNNING,REPLICA sequenceNumber:93 haPort:10012 delayMillis:0 catchupTimeSecs:0
        Rep Node [rg3-rn1]      Status: RUNNING,MASTER sequenceNumber:92 haPort:10013
Storage Node [sn2] on localhost:11000
    Zone: [name=Houston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
    Status: RUNNING   Ver: 18.3.0 2018-04-03 05:36:25 UTC  Build id: ec627ef967d6 Edition: Enterprise
        Admin [admin2]          Status: RUNNING,UNKNOWN
        Rep Node [rg1-rn2]      Status: RUNNING,REPLICA sequenceNumber:93 haPort:11021 delayMillis:0 catchupTimeSecs:0
        Rep Node [rg2-rn2]      Status: RUNNING,MASTER sequenceNumber:93 haPort:11022
        Rep Node [rg3-rn2]      Status: RUNNING,REPLICA sequenceNumber:92 haPort:11023 delayMillis:0 catchupTimeSecs:0
Storage Node [sn3] on localhost:12000
    Zone: [name=Houston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
    Status: RUNNING   Ver: 18.3.0 2018-04-03 05:36:25 UTC  Build id: ec627ef967d6 Edition: Enterprise
        Admin [admin3]          Status: RUNNING,UNKNOWN
        Rep Node [rg1-rn3]      Status: RUNNING,MASTER sequenceNumber:93 haPort:12011
        Rep Node [rg2-rn3]      Status: RUNNING,REPLICA sequenceNumber:93 haPort:12012 delayMillis:0 catchupTimeSecs:0
        Rep Node [rg3-rn3]      Status: RUNNING,REPLICA sequenceNumber:92 haPort:12013 delayMillis:0 catchupTimeSecs:0

2018-04-03 08:18:52.254 UTC SEVERE [admin1] JE: Disk usage is not within
je.maxDisk or je.freeDisk limits and write operations are prohibited:
maxDiskLimit=2,097,152 freeDiskLimit=5,368,709,120
adjustedMaxDiskLimit=2,097,152 maxDiskOverage=83,086
freeDiskShortage=-6,945,071,104 diskFreeSpace=12,313,780,224
availableLogSize=-83,086 totalLogSize=2,180,238 activeLogSize=2,180,238
reservedLogSize=0 protectedLogSize=0 protectedLogSizeMap={}

2018-04-03 08:19:34.808 UTC SEVERE [admin2] JE: Disk usage is not within
je.maxDisk or je.freeDisk limits and write operations are prohibited:
maxDiskLimit=2,097,152 freeDiskLimit=5,368,709,120
adjustedMaxDiskLimit=2,097,152 maxDiskOverage=97,346
freeDiskShortage=-6,944,923,648 diskFreeSpace=12,313,632,768
availableLogSize=-97,346 totalLogSize=2,194,498 activeLogSize=2,194,498
reservedLogSize=0 protectedLogSize=0 protectedLogSizeMap={}

2018-04-03 08:19:36.063 UTC SEVERE [admin3] JE: Disk usage is not within
je.maxDisk or je.freeDisk limits and write operations are prohibited:
maxDiskLimit=2,097,152 freeDiskLimit=5,368,709,120
adjustedMaxDiskLimit=2,097,152 maxDiskOverage=101,698
freeDiskShortage=-6,944,923,648 diskFreeSpace=12,313,632,768
availableLogSize=-101,698 totalLogSize=2,198,850 activeLogSize=2,198,850
reservedLogSize=0 protectedLogSize=0 protectedLogSizeMap={}

[#26922]

Store With Full Text Search May Become Unsynchronized

A store that has enabled support for Full Text Search may, on rare occasions, encounter a bug in which internal components of a master Replication Node become unsynchronized, causing updates from that Replication Node to stop flowing to the Elasticsearch engine. This problem will cause data to be out of sync between the store and Elasticsearch.

When the problem occurs, the Elasticsearch indices stop being populated. The problem involves the shutdown of the feeder channel for a component called the TextIndexFeeder, and is logged in the debug logs for the Replication Node. For example:

2018-03-16 11:23:46.055 UTC INFO [rg1-rn1] JE: Inactive channel: TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78(2147483647) forced close. Timeout: 10000ms.
2018-03-16 11:23:46.059 UTC INFO [rg1-rn1] JE: Shutting down feeder for replica TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78 Reason: null write time:  32ms Avg write time: 100us
2018-03-16 11:23:46.060 UTC INFO [rg1-rn1] JE: Feeder Output for TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78 soft shutdown initiated.
2018-03-16 11:23:46.064 UTC WARNING [rg1-rn1] internal exception Expected bytes: 6 read bytes: 0
com.sleepycat.je.utilint.InternalException: Expected bytes: 6 read bytes: 0
    at com.sleepycat.je.rep.subscription.SubscriptionThread.loopInternal(SubscriptionThread.java:719)
    at com.sleepycat.je.rep.subscription.SubscriptionThread.run(SubscriptionThread.java:180)
Caused by: java.io.IOException: Expected bytes: 6 read bytes: 0
    at com.sleepycat.je.rep.utilint.BinaryProtocol.fillBuffer(BinaryProtocol.java:446)
    at com.sleepycat.je.rep.utilint.BinaryProtocol.read(BinaryProtocol.java:466)
    at com.sleepycat.je.rep.subscription.SubscriptionThread.loopInternal(SubscriptionThread.java:656)
    ... 1 more

2018-03-16 11:23:46.064 UTC INFO [rg1-rn1] SubscriptionProcessMessageThread soft shutdown initiated.
2018-03-16 11:23:46.492 UTC INFO [rg1-rn1] JE: Feeder output for TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78 shutdown. feeder VLSN: 4,066 currentTxnEndVLSN: 4,065

If the TextIndexFeeder channel is shutdown, then the user can restore it by creating a dummy full text search index. Here is an example of how you can do that.

Assuming that Elasticsearch is already registered, execute the following commands from the Admin CLI:

execute 'CREATE TABLE dummy (id INTEGER,title STRING,PRIMARY KEY (id))'
execute 'CREATE FULLTEXT INDEX dummytextindex ON dummy (title)'
execute 'DROP TABLE dummy'

Note that dummy is the name of a temporary table that should not exist previously.

Creating a full text search index reestablishes the channel from the store to Elasticsearch and ensures that data is synced up to date.

[#26859]

Data Verifier is Disabled By Default

The data verifier is turned off by default. In some cases, the data verifier was using a lot of I/O bandwidth and causing the system to slow down. Users can turn on the data verifier by issuing the following two commands from the Admin CLI:

plan change-parameters -wait -all-rns -params "configProperties=je.env.runVerifier=false"
change-policy -params "configProperties=je.env.runVerifier=false"

Note that, if the store has services with preexisting settings for the configProperties parameter, then users will need to get the current values and merge them with the new setting to disable the verifier:

show param -service rg1-rn1
show param -policy

For example, suppose rg1-rn1 has set the following cleaner parameter:

kv-> show param -service rg1-rn1
[...]
configProperties=je.cleaner.minUtilization=40

When updating the configProperties parameter, the new setting for the verifier should be added, separating the existing settings with semicolons:

plan change-parameters -wait -all-rns -params "configProperties=je.cleaner.minUtilization=40;je.env.runVerifier=false"

[KVSTORE-639]

Subscription Cannot Connect and Fails With InternalException

If a master transfer occurs due to a failure after the publisher is started and before a subscriber connects, an InternalException can occur when the subscriber tries to connect. The exception message will read "Failed to connect, will retry after sleeping 3000 ms". Restart the publisher to work around this problem.

[#27723]