Raft Replication Operations and Settings
Specifying Replication Unit Attributes
By default, Oracle Globally Distributed Database determines the number of replication units (RUs) in a shardspace and the number of chunks in an RU.
You can specify the number of primary RUs using the
-repunit
option when you create the shard catalog. Specify a
value greater than zero (0).
gdsctl> create shardcatalog ... -repunit number
The RU value cannot be modified after the first DEPLOY
command is run on the distributed database configuration. Before you run DEPLOY
, you can modify the number
of RUs using the MODIFY SHARDSPACE
command.
gdsctl> modify shardspace -shardspace shardspaceora -repunit number
Note that in system-managed sharding there is one shardspace named
shardspaceora
.
If the -repunit
parameter is not specified, the default
number of RUs is determined at the time of execution of the first
DEPLOY
command.
Ensuring Replicas Are Not Placed in the Same Rack
To ensure high availability, Raft replication group members should not be placed in the same rack.
If specified using the ADD SHARD
command -rack
rack_id
option, the shard catalog will enforce that shards that contain
replicated data are not placed in the same rack. If this is not possible an error is
raised.
gdsctl> add shard -connect connect_identifier … -rack rack_id
Getting Runtime Information for Replication Units
Use GDSCTL STATUS REPLICATION
to get replication unit
runtime statistics, such as the leader and its followers.
GDSCTL STATUS REPLICATION
can also be entered as STATUS
RU
, or just RU
).
When option -ru
is specified, you can get specific
information for a particular replication unit.
For example, to get information about replication unit ru1:
GDSCTL> status ru -ru 1
Replication units
------------------------
Database RU# Role Term Log Index Status
-------- --- ---- ---- --------- ------
cdbsh1_sh1 1 Leader 2 21977067 Ok
cdbsh2_sh2 1 Follower 2 21977067 Ok
cdbsh3_sh3 1 Follower 2 21977067 Ok
cdbsh1_sh1 2 Follower 1 32937130 Ok
cdbsh2_sh2 2 Leader 1 32937130 Ok
cdbsh3_sh3 2 Follower 1 32937130 Ok
cdbsh1_sh1 3 Follower 2 16506205 Ok
cdbsh2_sh2 3 Follower 2 16506205 Ok
cdbsh3_sh3 3 Leader 2 16506205 Ok
For details about the command syntax, options, and examples, see status ru (RU, status replication) in Global Data Services Concepts and Administration Guide.
Scaling with Raft Replication
You can add or remove shards from your Raft replication distributed database with the following instructions.
Adding Shards
To scale up your Raft replicated distributed database, you create and validate a new shard database, add the shards to the distributed database configuration, and deploy the updated configuration. .
The distributed database automatically rebalances the data on the new shard by default, but you can chose to manually rebalance the data with an extra option at deploy time.
See Work Flow for Adding Shards for detailed steps to add shards to the distributed database.
Generally when a new shard is deployed in the configuration, you will see the behavior noted in About Adding Shards
In a Raft replication case, when the shard is deployed, you can configure which of the following two scenarios take place:
-
Automatic rebalancing: By default, Raft replication automatically splits the replication units (RUs), such that for each shard added, 2 new RUs are created, and their leaders are placed on the new shard. Chunks from other RUs are moved to the new RUs to rebalance data distribution. The
DEPLOY
operation also creates some intermediate/staging RUs for relocating and balancing chunks which are dropped after theDEPLOY
tasks are completed.After deploying new shards you can run
CONFIG TASK
to view the ongoing rebalancing tasks. -
Manual rebalancing: If you don't want automatic rebalancing to occur, you can deploy the updated distributed database configuration with the
GDSCTL DEPLOY -no_rebalance
option, and then manually move RUs and chunks to suit your needs.See Moving Replication Unit Replicas and Moving A Chunk to Another Replication Unit for details.
Removing Shards
To scale down your Raft replicated distributed database, you move any RUs on the shard to the remaining shards, remove the shards from the distributed database configuration, and deploy the updated configuration.
-
Move any RUs off of the shard you plan to remove.
Before making any changes to the distributed database configuration, move RUs to the other shards.
See Moving Replication Unit Replicas for complete details.
-
Switch over all of the leader RU members from the shard to be dropped to other shards.
GDSCTL> switchover ru -ru 4 -shard target_shard [-timeout=n]
-
Move all of the follower RU members from the shard to be dropped to other shard databases where a follower does not exist for the specific RU you are moving.
GDSCTL> move ru -ru 1 -source shard_to_be_dropped -target target_shard_where_ru_follower_does_not_exist
-
-
Remove the shard from the distributed database configuration.
-
Verify that there are no RUs on the shard to be dropped.
GDSCTL> status ru -shard shard_to_be_dropped
-
Remove the shard PDB and CDB from the distributed database configuration, and remove the host address information from the VNCR list.
GDSCTL> remove shard shard_to_be_dropped GDSCTL> remove cdb shard_to_be_dropped_cdb_name GDSCTL> remove invitednode node
-
- You are left with additional RUs on the remaining shards for which you can
choose to:
-
Relocate the chunks on the remaining shards and remove the RUs that were moved from the dropped shard. For example:
GDSCTL> relocate chunk -chunk 3, 4 -sourceru 1, -targetru 2 GDSCTL> remove ru 4
See Moving A Chunk to Another Replication Unit and remove ru (replication_unit) for details.
-
Keep the additional RUs if those are needed for a new shard in future.
-
Moving Replication Unit Replicas
Use MOVE RU
to move a follower replica of a replication
unit from one shard database to another.
For example,
gdsctl> move ru -ru 1 -source dba -target dbb
Notes:
- Source database shouldn't contain the replica leader
- Target database should not already contain another replica of the replication unit
See move ru (replication_unit) in Global Data Services Concepts and Administration Guide for syntax and option details.
Changing the Replication Unit Leader
Using SWITCHOVER RU
, you can change which replica is the
leader for the specified replication unit.
The -shard
option makes the replication unit member on
the specified shard (database) the new leader of the given RU.
gdsctl> switchover ru -ru 1 -shard dba
To then automatically rebalance the leaders, use SWITCHOVER RU
-rebalance
.
For full syntax and option details, see switchover ru (replication_unit) in Global Data Services Concepts and Administration Guide.
Copying Replication Units
You can copy a replication unit from one shard database to another using
COPY RU
. This allows you to instantiate or repair a replica of a
replication unit on the target shard database.
For example, to copy replication unit 1 from dba to dbb:
gdsctl> copy ru -ru 1 -source dba -target dbb
Notes:
- Neither source database nor target database should be the replica leader for the given replication unit
- If the target database already contains this replication unit it will be replaced by full replica of the replication unit on the source database
- If
-replace
is specified, the replication unit is removed from that database - If the target database doesn't contain the specified replication
unit, then the total number of members for the given replication unit should be
less than replication factor (3), unless
-replace
is specified.gdsctl> copy ru -ru 1 -source dba -target dbc -replace dbb
- If
-source
is not specified, then an existing follower of the replication unit is chosen as the source database.
Note:
Because running this command requires a tablespace set for the destination chunk, create a minimum of 1 tablespace set before running this command.For syntax and option details, see copy ru (replication_unit) in Global Data Services Concepts and Administration Guide.
Moving A Chunk to Another Replication Unit
To move a chunk from one Raft replication unit to another replication unit,
use the GDSCTL RELOCATE CHUNK
command.
To use RELOCATE CHUNK
, the source and target replication
unit leaders must be located on the same shard, and their followers must also be on
the same shards. If they are not on the same shard, use SWITCHOVER
RU
to move the leader and MOVE RU
to move the
followers to co-located shards.
When moving chunks, specify the chunk ID numbers, the source RU ID from which to move them, and the target RU ID to move them to, as shown here.
GDSCTL> relocate chunk -chunk 3, 4 -sourceru 1, -targetru 2
The specified chunks must be in the same source replication unit. If
-targetru
is not specified, an new empty target replication
unit is created.
GDSCTL MOVE CHUNK
is not supported for moving chunks in
a distributed database with Raft replication enabled.
Note:
Because running this command requires a tablespace set for the destination chunk, create a minimum of 1 tablespace set before running this command.See also relocate chunk in Global Data Services Concepts and Administration Guide.
Splitting Chunks in Raft Replication
You can manually split chunks with GDSCTL SPLIT CHUNK
in a
Raft replication-enabled distributed database.
If you want to move some data within an RU to a new chunk, you can use
GDSCTL SPLIT CHUNK
to manually split the
chunk.
You can then use RELOCATE CHUNK
to move the new chunk to another RU
if you wish. See Moving A Chunk to Another Replication Unit.
Note:
Because running this command requires a tablespace set for the destination chunk, create a minimum of 1 tablespace set before running this command.Getting the Replication Type
To find out if your distributed database is using Raft replication, run CONFIG SDB
to see the replication type in
the command output.
In the command output, the Replication Type is listed as Native when Raft replication is enabled.
For example,
GDSCTL> config sdb
GDS Pool administrators
------------------------
Replication Type
------------------------
Native
Shard type
------------------------
System-managed
Shard spaces
------------------------
shardspaceora
Services
------------------------
oltp_ro_srvc
oltp_rw_srvc
Starting and Stopping Replication Units
The GDSCTL commands START RU
and STOP RU
can be used to facilitate maintenance operations.
You might want to stop the RU on a specific replica to disable replication temporarily so that you can do maintenance tasks on the database, operating system, or machine.
You can run START RU
and STOP RU
commands for
specific replicas within a given replication unit (RU) or for all replicas.
The START RU
command is used to resume the operation of previously
stopped RUs. Additionally, it can be used in cases where an RU is offline due to
errors. For example, if the log producer process for any replica within an RU stops
functioning, it results in the RU being halted. The START RU
command lets you restart the RU without a complete database restart.
To use the commands, follow this syntax:
start ru -ru ru_id [-database db]
stop ru -ru ru_id [-database db]
You supply the RU ID that you want to start to stop, and you can optionally specify the database name on which a member of the RU runs. If the database is not specified, the commands affect all available replicas of the specified replication unit.
Synchronizing Replication Unit Members
Use the GDSCTL command SYNC RU
to synchronize data of the
specified replication unit on all shards. This operation also erases Raft logs and resets
log index and term.
To use SYNC RU
, specify the replication unit
(-ru ru_id
).
gdsctl> sync ru -ru ru_id [-database db]
You can optionally specify a shard database name. If a database is not
specified for the SYNC RU
command, a replica to synchronize with
will be chosen based on the following criteria:
- Pick the replica that was the last leader.
- If not available, pick the replica with greatest apply index.
The status of the SYNC RU
operation can be seen using
gdsctl config task
.
If you see "Warning: GSM timeout expired" this doesn't mean that the synchronization operation is not still running. Replication unit synchronization can take a longer time to complete than the default GDSCTL timeout.
If you don't want to see "Warning: GSM timeout expired" you can set the GDSCTL global service manager (shard director) request timeout to a higher value.
gdsctl configure -gsm gsm_ID -timeout seconds -save_config
Enable or Disable Reads from Follower Replication Units
Use the database initialization parameter
SHARD_ENABLE_RAFT_FOLLOWER_READ
to enable or disable reads from
follower replication units in a shard.
Set this parameter to TRUE
to enable reads, or set to
FALSE
to disable reads.
This parameter can have different values on different shards.
See also: SHARD_ENABLE_RAFT_FOLLOWER_READ in Oracle Database Reference.
Viewing Parameter Settings
Use the SHARD_RAFT_PARAMETERS
static data dictionary view
to see parameters set at an RU level on each shard.
The values for these parameters, if set, can be seen in this view. The columns in the view are:
ORA_SHARD_ID
: shard ID
RU_ID
: replication unit ID
NAME
: parameter name
VALUE
: parameter value
For details about this view, see SHARD_RAFT_PARAMETERS in Oracle Database Reference.
Setting Parameters with GDSCTL
You can set some Raft-specific parameters at the replication unit level on
each shard using the GDSCTL set ru parameter
command.
Syntax
set ru parameter
parameter_name=value [-shard
shard_name] [-ru ru_id]
Arguments
Argument | Type | |
---|---|---|
parameter_name=value |
Specify the parameter name and the value you wish to set it to. See the following topics for details about each parameter setting. |
|
-ru
ru_id |
Specify a replication unit ID number, If not specified, the command applies to all RUs. | |
-shard shard_name |
Specify a shard name. If not specified, the command applies to all shards. |
Tuning Flow Control to Mitigate Follower Lag
Flow control in Raft replication coordinates Raft group followers to optimize performance, efficiently utilize memory, and smooth out replication pipeline hiccups, such as variable network latency.
Followers may not consistently maintain the same speed. Occasionally, one might be slightly faster, while at other times, slightly slower.
To tune flow control set the SHARD_FLOW_CONTROL
parameter on the shard where a follower is lagging.
For example,
gdsctl set ru parameter SHARD_FLOW_CONTROL=value
You can optionally specify a shard (-shard) or a replication unit ID number (-ru)
The value
argument can be set to one of the
following:
-
(Default)
TILL_TIMEOUT
: As long as the slow follower has received an LCR within a threshold time (see "Configuring Threshold Timeout" below), from the fast follower, the fast follower is throttled.However, if the slow follower falls behind by more than the threshold time, then it is disconnected, at which point it may or may not be able to catch up, depending on why there is a lag between the two followers. For example, if the slow follower is lagging because the network connection to it is bad for a very long time, it will be disconnected. This is also the case if the slow follower is actually a down follower.
TILL_TIMEOUT
at 10 times the heartbeat interval is theSHARD_FLOW_CONTROL
default setting. -
AT_DISTANCE
: As long as the slow follower is within a threshold distance (see "Configuring Threshold Distance" below), in terms of LCRs, from the fast follower, the fast follower is throttled.However, if the slow follower falls behind by more than the threshold distance, then it is disconnected, at which point it may or may not be able to catch up, depending on why there is a lag between the two followers. For example, if the slow follower is lagging because the network connection to it is bad for a very long time, it will be disconnected. This is also the case if the slow follower is actually a down follower.
-
AT_LOGLIMIT
: Flow control does not kick in during normal operation at all, but only starts if the log file is about to be overwritten by the leader but the slow follower still needs LCRs from the file being overwritten. When this situation occurs, the leader waits for the slow follower to consume the LCRs from the file to be overwritten.If the slow follower is actually a down follower, then with this option the leader waits for the slow follower to come online again when the RU's raft log limit is reached.
Configuring Threshold Distance
Threshold distance, expressed as a percentage of the in-memory queue
size for LCRs, is used by the AT_DISTANCE
option for flow
control.
The default value is 10.
To set the threshold distance to another value, run:
gdsctl set ru parameter SHARD_FLOW_CONTROL_NS_DISTANCE_PCT=number
You can optionally specify a shard (-shard) or a replication unit ID number (-ru)
Configuring Threshold Timeout
Threshold timeout, in milliseconds, is used by the
TILL_TIMEOUT
option for flow control.
The timeout is expressed as a multiple of the heartbeat interval, and the default value is 10.
To set the threshold timeout, run:
gdsctl set ru parameter SHARD_FLOW_CONTROL_TIMEOUT_MULTIPLIER=milliseconds
You can optionally specify a shard (-shard) or a replication unit ID number (-ru)
See Setting Parameters with GDSCTL for details about using the set ru parameter
command.
Setting Transaction Consensus Timeout
You can change the timeout value for a transaction to get consensus in Raft replication.
To configure the transaction consensus timeout, set the
SHARD_TXN_ACK_TIMEOUT_SEC
parameter, which specified the
maximum time a user transaction waits for the consensus of its commit before raising
ORA-05086.
gdsctl set ru parameter SHARD_TXN_ACK_TIMEOUT_SEC=seconds
You can optionally specify a shard (-shard) or a replication unit ID number (-ru)
By default, Raft replication waits 90 seconds for a transaction to get consensus. However, if the leader is disconnected from the other replicas, it may not get consensus for its commits; if there is low memory in the replication pipeline, the replication of LCRs slows down, resulting in the delayed delivery of acknowledgments. In many cases such as these, 90 seconds may be too long to wait, so you may want to error out a transaction much earlier, depending on your application requirements.
The minimum valid value is 1 second.
See Setting Parameters with GDSCTL for details about using the set ru parameter
command.
Dynamic Performance Views for Raft Replication
There are several dynamic performance (V$
) views available
for Raft replication.
V$SHARD_ACK_SENDER
V$SHARD_ACK_RECEIVER
V$SHARD_APPLY_COORDINATOR
V$SHARD_APPLY_LCR_READER
V$SHARD_APPLY_READER
V$SHARD_APPLY_SERVER
V$SHARD_LCR_LOGS
V$SHARD_LCR_PERSISTER
V$SHARD_LCR_PRODUCER
V$SHARD_NETWORK_SENDER
V$SHARD_MESSAGE_TRACKING
V$SHARD_REPLICATION_UNIT
V$SHARD_TRANSACTION
For descriptions and column details for these views, see Dynamic Performance Views in Oracle Database Reference.