Sharding Keys

Tips for choosing sharding keys, information about constraints rules, and enabling data movement between shards on sharding key updates.

Choosing Sharding Keys

Sharded table partitions are distributed across shards at the tablespace level, based on a sharding key. Examples of keys include customer ID, account number, and country ID.

Sharding keys must adhere to the following characteristics.

  • The sharding key should be very stable; its value should almost never change.

  • The sharding key must be present in all of the sharded tables. This allows the creation of a family of equi-partitioned tables based on the sharding key.

  • Joins between tables in a table family should be performed using the sharding key.

Sharding Keys for System-Managed Distributed Databases

For the system-managed data distribution method, the sharding key must be based on a column that has high cardinality; the number of unique values in this column must be much bigger than the number of shards. Customer ID, for example, is a good candidate for the sharding key, while a United States state name is not.

A sharding key can be a single column or multiple columns. When multiple columns are present, the hash of the columns are concatenated to form the sharding key.

The following examples create a sharded table called Customers and specify that columns cust_id and name form the sharding keys for the table.

CREATE SHARDED TABLE customers
(cust_id     NUMBER NOT NULL
, name        VARCHAR2(50)
, address     VARCHAR2(250)
, region      VARCHAR2(20)
, class       VARCHAR2(3)
, signup      DATE,
CONSTRAINT cust_pk PRIMARY KEY(cust_id, name))
PARTITION BY CONSISTENT HASH (cust_id,name)
PARTITIONS AUTO
TABLESPACE SET ts1;
CREATE SHARDED TABLE Orders
( OrderNo   NUMBER NOT NULL
, CustNo    NUMBER NOT NULL
, Name      VARCHAR2(50) NOT NULL
, OrderDate DATE
, CONSTRAINT OrderPK PRIMARY KEY (CustNo, Name, OrderNo)
, CONSTRAINT CustFK  FOREIGN KEY (CustNo, Name) REFERENCES Customers(Cust_ID, Name)
)
PARTITION BY REFERENCE (CustFK);

Sharding Keys for Composite Distributed Databases

Composite data distribution enables two levels of partitioning - one by list or range and another by consistent hash. This is accomplished by the application providing two keys: a super sharding key and a sharding key.

Composite distribution does not support multi-column LIST partitionsets, as shown here.

CREATE SHARDED TABLE customers (
cust_id     NUMBER NOT NULL,
Name    VARCHAR2(50) NOT NULL,
class VARCHAR2(3) NOT NULL ,
class2 number not null,
CONSTRAINT cust_pk PRIMARY KEY(cust_id,name,class))
PARTITIONSET BY LIST (class, class2)
PARTITION BY CONSISTENT HASH (cust_id,name)
PARTITIONS AUTO (
PARTITIONSET silver VALUES (('SLV',1),('BRZ',2)) TABLESPACE SET ts1
PARTITIONSET gold   VALUES (('GLD',3),('OTH',4)) TABLESPACE SET ts2);

PARTITION BY CONSISTENT HASH (cust_id,name)
*
ERROR at line 8:
ORA-02514: list PARTITIONSET method expects a single partitioning column

Multi-column RANGE partitionsets are supported, as shown below.


CREATE SHARDED TABLE customers (
cust_id     NUMBER NOT NULL,
Name    VARCHAR2(50) NOT NULL,
class number NOT NULL ,
class2 number not null,
CONSTRAINT cust_pk PRIMARY KEY(cust_id,name,class))
PARTITIONSET BY RANGE (class, class2)
PARTITION BY CONSISTENT HASH (cust_id,name)
PARTITIONS AUTO (
PARTITIONSET silver VALUES LESS THAN (10,100) TABLESPACE SET ts1,
PARTITIONSET gold   VALUES LESS THAN (20,200) TABLESPACE SET ts2);

Table created.

In both of the above cases, the sharding key (not the partitionset key) can be multi-column.

Sharding Keys for User-Defined Distributed Databases

For partition by list in user-defined data distribution, Oracle Globally Distributed Database expects a single sharding key column. An error is thrown when multiple columns are specified for a list-partitioned sharded table.

CREATE SHARDED TABLE accounts
( id             NUMBER
, account_number NUMBER
, customer_id    NUMBER
, branch_id      NUMBER
, state          VARCHAR(2) NOT NULL
, state2         VARCHAR(2) NOT NULL
, status         VARCHAR2(1)
)
PARTITION BY LIST (state,state2)
( PARTITION p_northwest VALUES ('OR', 'WA') TABLESPACE ts1
, PARTITION p_southwest VALUES ('AZ', 'UT', 'NM') TABLESPACE ts2
, PARTITION p_northcentral VALUES ('SD', 'WI') TABLESPACE ts3
, PARTITION p_southcentral VALUES ('OK', 'TX') TABLESPACE ts4
, PARTITION p_northeast VALUES ('NY', 'VM', 'NJ') TABLESPACE ts5
, PARTITION p_southeast VALUES ('FL', 'GA') TABLESPACE ts6
);

ERROR at line 1:
ORA-03813: list partition method expects a single partitioning column in
user-defined sharding

For a range-partitioned sharded table, you can specify multiple columns as sharding key columns.


CREATE SHARDED TABLE accounts
( id             NUMBER
, account_number NUMBER
, customer_id    NUMBER
, branch_id      NUMBER
, state          NUMBER NOT NULL
, state2         NUMBER NOT NULL
, status         VARCHAR2(1)
)
PARTITION BY RANGE (state, state2)
( PARTITION p_northwest VALUES LESS THAN(10, 100) TABLESPACE ts1
, PARTITION p_southwest VALUES LESS THAN(20,200) TABLESPACE ts2);

Table created.

But in both cases, the sharding key (not the partitionset key) can be multi-column.

Sharding Key Type Support

The following data types are supported for the sharding key.

  • NUMBER

  • INTEGER

  • SMALLINT

  • RAW

  • VARCHAR

  • (N)VARCHAR2

  • (N)CHAR

  • DATE

  • TIMESTAMP

Primary Key and Foreign Key Constraints

In a Oracle Globally Distributed Database environment, the primary key constraints and foreign key constraints are controlled by the following rules.

  • For primary keys, there are unique constraints and unique indexes on sharded tables; the column list must contain the sharding key columns. In earlier Oracle releases the restriction was that the sharding key must be a prefix of such columns, but this rule is now more relaxed.

  • Foreign keys from one sharded table to another sharded table also must contain the sharding key. This is automatically enforced because a foreign key refers to either the primary key or unique columns of the referenced table.

  • Foreign keys on sharded tables must be within the same table family. This is required because different table families have different sharding key columns.

  • Foreign keys in sharded tables referencing local tables are not allowed.

  • Foreign keys in sharded tables referencing duplicated tables are not allowed.

  • Foreign keys in duplicated table referencing sharded tables are not allowed.

Enabling Automatic Data Movement on Sharding Key Update

You can update the sharding key for any particular record directly on the shard where the data is located, using a normal SQL UPDATE statement. Oracle Globally Distributed Database moves the data to the correct shard automatically, as a distributed transaction in the background.

Note:

It is recommended that you commit your previous work and start a new transaction before you update a sharding key, because a distributed transaction has a higher risk than local transaction in the case of a remote machine or network failure. To avoid race condition, the rows being updated are locked before starting the insertion-deletion operation.

Note:

This feature is not supported when Raft replication is configured.

Use Case

Sometimes a sharding key value for a particular record must be updated. For example, employee location can be one of the sharding keys. When employees move from one country to another, their country value must be updated, in which case the data must be moved to the shard mapped to the new key value.

What happens when I update a sharding key?

When a record’s sharding key value is updated in a sharded table, the record could end up in three possible locations after the update, depending on which shard the new value is mapped to.

1. The row stays within the same partition.

2. The row is moved to a different partition in the same shard.

3. The row is moved to a different shard.

When the sharding key value on a particular row of a sharded table is updated, Oracle Globally Distributed Database handles moving the data to a new location, whether it is in a different partition on the same shard or on a different shard.

Support for Sharded Table Family

A table family is a parent-child relationship between database tables. Multiple tables linked by such relationships typically form a tree-like hierarchy where every child has a single parent. A table family can be defined using reference partitioning or the PARENT clause.

Support for automatic row movement in a table family which was created with the PARENT clause, requires primary key-foreign key constraints to be added between the parent and child table.

In a table family which was created with reference partitioning, the primary key-foreign key constraints between the parent and child table already exist, so automatic row movement is supported and no extra step is needed.

Enable and Disable Automatic Data Movement

This operation uses the ROW MOVEMENT clause on the sharded table in the database.

ALTER TABLE tablename ENABLE ROW MOVEMENT;

ALTER TABLE tablename DISABLE ROW MOVEMENT;

ROW MOVEMENT can also be specified on CREATE TABLE.

When ROW MOVEMENT is enabled and there is an update to the sharding key value, the data is transparently moved between shards.

For details about the ROW MOVEMENT clause, see row_movement_clause in Oracle Database SQL Language Reference.

Enable Automatic Data Movement in a Table Family

In the case of a table family, ROW MOVEMENT must be enabled on the child table first, and then on the parent table.

In the example below, accounts is the root table, orders is a child of accounts, lineitems is a child of orders.

ALTER TABLE lineitems ENABLE ROW MOVEMENT;

ALTER TABLE orders ENABLE ROW MOVEMENT;

ALTER TABLE accounts ENABLE ROW MOVEMENT;