Raft Replication Operations and Settings

Specifying Replication Unit Attributes

By default, Oracle Globally Distributed Database determines the number of replication units (RUs) in a shardspace and the number of chunks in an RU.

You can specify the number of primary RUs using the -repunit option when you create the shard catalog. Specify a value greater than zero (0).

gdsctl> create shardcatalog ... -repunit number

The RU value cannot be modified after the first DEPLOY command is run on the distributed database configuration. Before you run DEPLOY, you can modify the number of RUs using the MODIFY SHARDSPACE command.

gdsctl> modify shardspace -shardspace shardspaceora -repunit number

Note that in system-managed sharding there is one shardspace named shardspaceora.

If the -repunit parameter is not specified, the default number of RUs is determined at the time of execution of the first DEPLOY command.

Ensuring Replicas Are Not Placed in the Same Rack

To ensure high availability, Raft replication group members should not be placed in the same rack.

If specified using the ADD SHARD command -rack rack_id option, the shard catalog will enforce that shards that contain replicated data are not placed in the same rack. If this is not possible an error is raised.

gdsctl> add shard -connect connect_identifier … -rack rack_id

Getting Runtime Information for Replication Units

Use GDSCTL STATUS REPLICATION to get replication unit runtime statistics, such as the leader and its followers.

GDSCTL STATUS REPLICATION can also be entered as STATUS RU, or just RU).

When option -ru is specified, you can get specific information for a particular replication unit.

For example, to get information about replication unit ru1:

GDSCTL> status ru -ru 1

Replication units
------------------------
Database RU# Role Term Log Index Status
-------- --- ---- ---- --------- ------
cdbsh1_sh1 1 Leader 2 21977067 Ok
cdbsh2_sh2 1 Follower 2 21977067 Ok
cdbsh3_sh3 1 Follower 2 21977067 Ok
cdbsh1_sh1 2 Follower 1 32937130 Ok
cdbsh2_sh2 2 Leader 1 32937130 Ok
cdbsh3_sh3 2 Follower 1 32937130 Ok
cdbsh1_sh1 3 Follower 2 16506205 Ok
cdbsh2_sh2 3 Follower 2 16506205 Ok
cdbsh3_sh3 3 Leader 2 16506205 Ok 

For details about the command syntax, options, and examples, see status ru (RU, status replication) in Global Data Services Concepts and Administration Guide.

Scaling with Raft Replication

You can add or remove shards from your Raft replication distributed database with the following instructions.

Adding Shards

To scale up your Raft replicated distributed database, you create and validate a new shard database, add the shards to the distributed database configuration, and deploy the updated configuration. .

The distributed database automatically rebalances the data on the new shard by default, but you can chose to manually rebalance the data with an extra option at deploy time.

See Work Flow for Adding Shards for detailed steps to add shards to the distributed database.

Generally when a new shard is deployed in the configuration, you will see the behavior noted in About Adding Shards

In a Raft replication case, when the shard is deployed, you can configure which of the following two scenarios take place:

  • Automatic rebalancing: By default, Raft replication automatically splits the replication units (RUs), such that for each shard added, 2 new RUs are created, and their leaders are placed on the new shard. Chunks from other RUs are moved to the new RUs to rebalance data distribution. The DEPLOY operation also creates some intermediate/staging RUs for relocating and balancing chunks which are dropped after the DEPLOY tasks are completed.

    After deploying new shards you can run CONFIG TASK to view the ongoing rebalancing tasks.

  • Manual rebalancing: If you don't want automatic rebalancing to occur, you can deploy the updated distributed database configuration with the GDSCTL DEPLOY -no_rebalance option, and then manually move RUs and chunks to suit your needs.

    See Moving Replication Unit Replicas and Moving A Chunk to Another Replication Unit for details.

Removing Shards

To scale down your Raft replicated distributed database, you move any RUs on the shard to the remaining shards, remove the shards from the distributed database configuration, and deploy the updated configuration.

  1. Move any RUs off of the shard you plan to remove.

    Before making any changes to the distributed database configuration, move RUs to the other shards.

    See Moving Replication Unit Replicas for complete details.

    1. Switch over all of the leader RU members from the shard to be dropped to other shards.

      GDSCTL> switchover ru -ru 4
       -shard target_shard
       [-timeout=n]
    2. Move all of the follower RU members from the shard to be dropped to other shard databases where a follower does not exist for the specific RU you are moving.

      GDSCTL> move ru -ru 1
       -source shard_to_be_dropped
       -target target_shard_where_ru_follower_does_not_exist
  2. Remove the shard from the distributed database configuration.

    1. Verify that there are no RUs on the shard to be dropped.

      GDSCTL> status ru -shard shard_to_be_dropped 
      
    2. Remove the shard PDB and CDB from the distributed database configuration, and remove the host address information from the VNCR list.

      GDSCTL> remove shard shard_to_be_dropped
      GDSCTL> remove cdb shard_to_be_dropped_cdb_name
      GDSCTL> remove invitednode node
  3. You are left with additional RUs on the remaining shards for which you can choose to:
    • Relocate the chunks on the remaining shards and remove the RUs that were moved from the dropped shard. For example:

      GDSCTL> relocate chunk -chunk 3, 4 -sourceru 1, -targetru 2
      GDSCTL> remove ru 4

      See Moving A Chunk to Another Replication Unit and remove ru (replication_unit) for details.

    • Keep the additional RUs if those are needed for a new shard in future.

Moving Replication Unit Replicas

Use MOVE RU to move a follower replica of a replication unit from one shard database to another.

For example,

gdsctl> move ru -ru 1 -source dba -target dbb

Notes:

  • Source database shouldn't contain the replica leader
  • Target database should not already contain another replica of the replication unit

See move ru (replication_unit) in Global Data Services Concepts and Administration Guide for syntax and option details.

Changing the Replication Unit Leader

Using SWITCHOVER RU, you can change which replica is the leader for the specified replication unit.

The -shard option makes the replication unit member on the specified shard (database) the new leader of the given RU.

gdsctl> switchover ru -ru 1 -shard dba

To then automatically rebalance the leaders, use SWITCHOVER RU -rebalance.

For full syntax and option details, see switchover ru (replication_unit) in Global Data Services Concepts and Administration Guide.

Copying Replication Units

You can copy a replication unit from one shard database to another using COPY RU. This allows you to instantiate or repair a replica of a replication unit on the target shard database.

For example, to copy replication unit 1 from dba to dbb:

gdsctl> copy ru -ru 1 -source dba -target dbb

Notes:

  • Neither source database nor target database should be the replica leader for the given replication unit
  • If the target database already contains this replication unit it will be replaced by full replica of the replication unit on the source database
  • If -replace is specified, the replication unit is removed from that database
  • If the target database doesn't contain the specified replication unit, then the total number of members for the given replication unit should be less than replication factor (3), unless -replace is specified.
    gdsctl> copy ru -ru 1 -source dba -target dbc -replace dbb
  • If -source is not specified, then an existing follower of the replication unit is chosen as the source database.

Note:

Because running this command requires a tablespace set for the destination chunk, create a minimum of 1 tablespace set before running this command.

For syntax and option details, see copy ru (replication_unit) in Global Data Services Concepts and Administration Guide.

Moving A Chunk to Another Replication Unit

To move a chunk from one Raft replication unit to another replication unit, use the GDSCTL RELOCATE CHUNK command.

To use RELOCATE CHUNK, the source and target replication unit leaders must be located on the same shard, and their followers must also be on the same shards. If they are not on the same shard, use SWITCHOVER RU to move the leader and MOVE RU to move the followers to co-located shards.

When moving chunks, specify the chunk ID numbers, the source RU ID from which to move them, and the target RU ID to move them to, as shown here.

GDSCTL> relocate chunk -chunk 3, 4 -sourceru 1, -targetru 2

The specified chunks must be in the same source replication unit. If -targetru is not specified, an new empty target replication unit is created.

GDSCTL MOVE CHUNK is not supported for moving chunks in a distributed database with Raft replication enabled.

Note:

Because running this command requires a tablespace set for the destination chunk, create a minimum of 1 tablespace set before running this command.

See also relocate chunk in Global Data Services Concepts and Administration Guide.

Splitting Chunks in Raft Replication

You can manually split chunks with GDSCTL SPLIT CHUNK in a Raft replication-enabled distributed database.

If you want to move some data within an RU to a new chunk, you can use GDSCTL SPLIT CHUNK to manually split the chunk.

You can then use RELOCATE CHUNK to move the new chunk to another RU if you wish. See Moving A Chunk to Another Replication Unit.

Note:

Because running this command requires a tablespace set for the destination chunk, create a minimum of 1 tablespace set before running this command.

Getting the Replication Type

To find out if your distributed database is using Raft replication, run CONFIG SDB to see the replication type in the command output.

In the command output, the Replication Type is listed as Native when Raft replication is enabled.

For example,

GDSCTL> config sdb

GDS Pool administrators
------------------------

Replication Type
------------------------
Native

Shard type
------------------------
System-managed

Shard spaces
------------------------
shardspaceora

Services
------------------------
oltp_ro_srvc
oltp_rw_srvc

Starting and Stopping Replication Units

The GDSCTL commands START RU and STOP RU can be used to facilitate maintenance operations.

You might want to stop the RU on a specific replica to disable replication temporarily so that you can do maintenance tasks on the database, operating system, or machine.

You can run START RU and STOP RU commands for specific replicas within a given replication unit (RU) or for all replicas.

The START RU command is used to resume the operation of previously stopped RUs. Additionally, it can be used in cases where an RU is offline due to errors. For example, if the log producer process for any replica within an RU stops functioning, it results in the RU being halted. The START RU command lets you restart the RU without a complete database restart.

To use the commands, follow this syntax:

start ru -ru ru_id [-database db]
stop ru -ru ru_id [-database db]

You supply the RU ID that you want to start to stop, and you can optionally specify the database name on which a member of the RU runs. If the database is not specified, the commands affect all available replicas of the specified replication unit.

Synchronizing Replication Unit Members

Use the GDSCTL command SYNC RU to synchronize data of the specified replication unit on all shards. This operation also erases Raft logs and resets log index and term.

To use SYNC RU, specify the replication unit (-ru ru_id).

gdsctl> sync ru -ru ru_id [-database db]

You can optionally specify a shard database name. If a database is not specified for the SYNC RU command, a replica to synchronize with will be chosen based on the following criteria:

  1. Pick the replica that was the last leader.
  2. If not available, pick the replica with greatest apply index.

The status of the SYNC RU operation can be seen using gdsctl config task.

If you see "Warning: GSM timeout expired" this doesn't mean that the synchronization operation is not still running. Replication unit synchronization can take a longer time to complete than the default GDSCTL timeout.

If you don't want to see "Warning: GSM timeout expired" you can set the GDSCTL global service manager (shard director) request timeout to a higher value.

gdsctl configure -gsm gsm_ID -timeout seconds -save_config

Enable or Disable Reads from Follower Replication Units

Use the database initialization parameter SHARD_ENABLE_RAFT_FOLLOWER_READ to enable or disable reads from follower replication units in a shard.

Set this parameter to TRUE to enable reads, or set to FALSE to disable reads.

This parameter can have different values on different shards.

See also: SHARD_ENABLE_RAFT_FOLLOWER_READ in Oracle Database Reference.

Viewing Parameter Settings

Use the SHARD_RAFT_PARAMETERS static data dictionary view to see parameters set at an RU level on each shard.

The values for these parameters, if set, can be seen in this view. The columns in the view are:

ORA_SHARD_ID: shard ID

RU_ID: replication unit ID

NAME: parameter name

VALUE: parameter value

For details about this view, see SHARD_RAFT_PARAMETERS in Oracle Database Reference.

Setting Parameters with GDSCTL

You can set some Raft-specific parameters at the replication unit level on each shard using the GDSCTL set ru parameter command.

Syntax

set ru parameter parameter_name=value [-shard shard_name] [-ru ru_id]

Arguments

Argument Type
parameter_name=value

Specify the parameter name and the value you wish to set it to. See the following topics for details about each parameter setting.

Tuning Flow Control to Mitigate Follower Lag

Setting Transaction Consensus Timeout

 
-ru ru_id Specify a replication unit ID number, If not specified, the command applies to all RUs.
-shard shard_name

Specify a shard name. If not specified, the command applies to all shards.

Tuning Flow Control to Mitigate Follower Lag

Flow control in Raft replication coordinates Raft group followers to optimize performance, efficiently utilize memory, and smooth out replication pipeline hiccups, such as variable network latency.

Followers may not consistently maintain the same speed. Occasionally, one might be slightly faster, while at other times, slightly slower.

To tune flow control set the SHARD_FLOW_CONTROL parameter on the shard where a follower is lagging.

For example,

gdsctl set ru parameter SHARD_FLOW_CONTROL=value

You can optionally specify a shard (-shard) or a replication unit ID number (-ru)

The value argument can be set to one of the following:

  • (Default) TILL_TIMEOUT: As long as the slow follower has received an LCR within a threshold time (see "Configuring Threshold Timeout" below), from the fast follower, the fast follower is throttled.

    However, if the slow follower falls behind by more than the threshold time, then it is disconnected, at which point it may or may not be able to catch up, depending on why there is a lag between the two followers. For example, if the slow follower is lagging because the network connection to it is bad for a very long time, it will be disconnected. This is also the case if the slow follower is actually a down follower.

    TILL_TIMEOUT at 10 times the heartbeat interval is the SHARD_FLOW_CONTROL default setting.

  • AT_DISTANCE: As long as the slow follower is within a threshold distance (see "Configuring Threshold Distance" below), in terms of LCRs, from the fast follower, the fast follower is throttled.

    However, if the slow follower falls behind by more than the threshold distance, then it is disconnected, at which point it may or may not be able to catch up, depending on why there is a lag between the two followers. For example, if the slow follower is lagging because the network connection to it is bad for a very long time, it will be disconnected. This is also the case if the slow follower is actually a down follower.

  • AT_LOGLIMIT: Flow control does not kick in during normal operation at all, but only starts if the log file is about to be overwritten by the leader but the slow follower still needs LCRs from the file being overwritten. When this situation occurs, the leader waits for the slow follower to consume the LCRs from the file to be overwritten.

    If the slow follower is actually a down follower, then with this option the leader waits for the slow follower to come online again when the RU's raft log limit is reached.

Configuring Threshold Distance

Threshold distance, expressed as a percentage of the in-memory queue size for LCRs, is used by the AT_DISTANCE option for flow control.

The default value is 10.

To set the threshold distance to another value, run:

gdsctl set ru parameter SHARD_FLOW_CONTROL_NS_DISTANCE_PCT=number

You can optionally specify a shard (-shard) or a replication unit ID number (-ru)

Configuring Threshold Timeout

Threshold timeout, in milliseconds, is used by the TILL_TIMEOUT option for flow control.

The timeout is expressed as a multiple of the heartbeat interval, and the default value is 10.

To set the threshold timeout, run:

gdsctl set ru parameter SHARD_FLOW_CONTROL_TIMEOUT_MULTIPLIER=milliseconds

You can optionally specify a shard (-shard) or a replication unit ID number (-ru)

See Setting Parameters with GDSCTL for details about using the set ru parameter command.

Setting Transaction Consensus Timeout

You can change the timeout value for a transaction to get consensus in Raft replication.

To configure the transaction consensus timeout, set the SHARD_TXN_ACK_TIMEOUT_SEC parameter, which specified the maximum time a user transaction waits for the consensus of its commit before raising ORA-05086.

gdsctl set ru parameter SHARD_TXN_ACK_TIMEOUT_SEC=seconds

You can optionally specify a shard (-shard) or a replication unit ID number (-ru)

By default, Raft replication waits 90 seconds for a transaction to get consensus. However, if the leader is disconnected from the other replicas, it may not get consensus for its commits; if there is low memory in the replication pipeline, the replication of LCRs slows down, resulting in the delayed delivery of acknowledgments. In many cases such as these, 90 seconds may be too long to wait, so you may want to error out a transaction much earlier, depending on your application requirements.

The minimum valid value is 1 second.

See Setting Parameters with GDSCTL for details about using the set ru parameter command.

Dynamic Performance Views for Raft Replication

There are several dynamic performance (V$) views available for Raft replication.

  • V$SHARD_ACK_SENDER
  • V$SHARD_ACK_RECEIVER
  • V$SHARD_APPLY_COORDINATOR
  • V$SHARD_APPLY_LCR_READER
  • V$SHARD_APPLY_READER
  • V$SHARD_APPLY_SERVER
  • V$SHARD_LCR_LOGS
  • V$SHARD_LCR_PERSISTER
  • V$SHARD_LCR_PRODUCER
  • V$SHARD_NETWORK_SENDER
  • V$SHARD_MESSAGE_TRACKING
  • V$SHARD_REPLICATION_UNIT
  • V$SHARD_TRANSACTION

For descriptions and column details for these views, see Dynamic Performance Views in Oracle Database Reference.