Sharding Keys
Tips for choosing sharding keys, information about constraints rules, and enabling data movement between shards on sharding key updates.
Choosing Sharding Keys
Sharded table partitions are distributed across shards at the tablespace level, based on a sharding key. Examples of keys include customer ID, account number, and country ID.
Sharding keys must adhere to the following characteristics.
-
The sharding key should be very stable; its value should almost never change.
-
The sharding key must be present in all of the sharded tables. This allows the creation of a family of equi-partitioned tables based on the sharding key.
-
Joins between tables in a table family should be performed using the sharding key.
Sharding Keys for System-Managed Distributed Databases
For the system-managed data distribution method, the sharding key must be based on a column that has high cardinality; the number of unique values in this column must be much bigger than the number of shards. Customer ID, for example, is a good candidate for the sharding key, while a United States state name is not.
A sharding key can be a single column or multiple columns. When multiple columns are present, the hash of the columns are concatenated to form the sharding key.
The following examples create a sharded table called Customers and specify that columns
cust_id
and name
form the sharding keys for the
table.
CREATE SHARDED TABLE customers
(cust_id NUMBER NOT NULL
, name VARCHAR2(50)
, address VARCHAR2(250)
, region VARCHAR2(20)
, class VARCHAR2(3)
, signup DATE,
CONSTRAINT cust_pk PRIMARY KEY(cust_id, name))
PARTITION BY CONSISTENT HASH (cust_id,name)
PARTITIONS AUTO
TABLESPACE SET ts1;
CREATE SHARDED TABLE Orders
( OrderNo NUMBER NOT NULL
, CustNo NUMBER NOT NULL
, Name VARCHAR2(50) NOT NULL
, OrderDate DATE
, CONSTRAINT OrderPK PRIMARY KEY (CustNo, Name, OrderNo)
, CONSTRAINT CustFK FOREIGN KEY (CustNo, Name) REFERENCES Customers(Cust_ID, Name)
)
PARTITION BY REFERENCE (CustFK);
Sharding Keys for Composite Distributed Databases
Composite data distribution enables two levels of partitioning - one by list or range and another by consistent hash. This is accomplished by the application providing two keys: a super sharding key and a sharding key.
Composite distribution does not support multi-column LIST
partitionsets, as shown here.
CREATE SHARDED TABLE customers (
cust_id NUMBER NOT NULL,
Name VARCHAR2(50) NOT NULL,
class VARCHAR2(3) NOT NULL ,
class2 number not null,
CONSTRAINT cust_pk PRIMARY KEY(cust_id,name,class))
PARTITIONSET BY LIST (class, class2)
PARTITION BY CONSISTENT HASH (cust_id,name)
PARTITIONS AUTO (
PARTITIONSET silver VALUES (('SLV',1),('BRZ',2)) TABLESPACE SET ts1
PARTITIONSET gold VALUES (('GLD',3),('OTH',4)) TABLESPACE SET ts2);
PARTITION BY CONSISTENT HASH (cust_id,name)
*
ERROR at line 8:
ORA-02514: list PARTITIONSET method expects a single partitioning column
Multi-column RANGE
partitionsets are supported, as shown below.
CREATE SHARDED TABLE customers (
cust_id NUMBER NOT NULL,
Name VARCHAR2(50) NOT NULL,
class number NOT NULL ,
class2 number not null,
CONSTRAINT cust_pk PRIMARY KEY(cust_id,name,class))
PARTITIONSET BY RANGE (class, class2)
PARTITION BY CONSISTENT HASH (cust_id,name)
PARTITIONS AUTO (
PARTITIONSET silver VALUES LESS THAN (10,100) TABLESPACE SET ts1,
PARTITIONSET gold VALUES LESS THAN (20,200) TABLESPACE SET ts2);
Table created.
In both of the above cases, the sharding key (not the partitionset key) can be multi-column.
Sharding Keys for User-Defined Distributed Databases
For partition by list in user-defined data distribution, Oracle Globally Distributed Database expects a single sharding key column. An error is thrown when multiple columns are specified for a list-partitioned sharded table.
CREATE SHARDED TABLE accounts
( id NUMBER
, account_number NUMBER
, customer_id NUMBER
, branch_id NUMBER
, state VARCHAR(2) NOT NULL
, state2 VARCHAR(2) NOT NULL
, status VARCHAR2(1)
)
PARTITION BY LIST (state,state2)
( PARTITION p_northwest VALUES ('OR', 'WA') TABLESPACE ts1
, PARTITION p_southwest VALUES ('AZ', 'UT', 'NM') TABLESPACE ts2
, PARTITION p_northcentral VALUES ('SD', 'WI') TABLESPACE ts3
, PARTITION p_southcentral VALUES ('OK', 'TX') TABLESPACE ts4
, PARTITION p_northeast VALUES ('NY', 'VM', 'NJ') TABLESPACE ts5
, PARTITION p_southeast VALUES ('FL', 'GA') TABLESPACE ts6
);
ERROR at line 1:
ORA-03813: list partition method expects a single partitioning column in
user-defined sharding
For a range-partitioned sharded table, you can specify multiple columns as sharding key columns.
CREATE SHARDED TABLE accounts
( id NUMBER
, account_number NUMBER
, customer_id NUMBER
, branch_id NUMBER
, state NUMBER NOT NULL
, state2 NUMBER NOT NULL
, status VARCHAR2(1)
)
PARTITION BY RANGE (state, state2)
( PARTITION p_northwest VALUES LESS THAN(10, 100) TABLESPACE ts1
, PARTITION p_southwest VALUES LESS THAN(20,200) TABLESPACE ts2);
Table created.
But in both cases, the sharding key (not the partitionset key) can be multi-column.
Sharding Key Type Support
The following data types are supported for the sharding key.
-
NUMBER
-
INTEGER
-
SMALLINT
-
RAW
-
VARCHAR
-
(N)VARCHAR2
-
(N)CHAR
-
DATE
-
TIMESTAMP
Primary Key and Foreign Key Constraints
In a Oracle Globally Distributed Database environment, the primary key constraints and foreign key constraints are controlled by the following rules.
-
For primary keys, there are unique constraints and unique indexes on sharded tables; the column list must contain the sharding key columns. In earlier Oracle releases the restriction was that the sharding key must be a prefix of such columns, but this rule is now more relaxed.
-
Foreign keys from one sharded table to another sharded table also must contain the sharding key. This is automatically enforced because a foreign key refers to either the primary key or unique columns of the referenced table.
-
Foreign keys on sharded tables must be within the same table family. This is required because different table families have different sharding key columns.
-
Foreign keys in sharded tables referencing local tables are not allowed.
-
Foreign keys in sharded tables referencing duplicated tables are not allowed.
-
Foreign keys in duplicated table referencing sharded tables are not allowed.
Enabling Automatic Data Movement on Sharding Key Update
You can update the sharding key for any
particular record directly on the shard where the data is located, using a normal
SQL UPDATE
statement. Oracle Globally Distributed Database moves the data to the correct shard automatically, as a distributed transaction in the
background.
Note:
It is recommended that you commit your previous work and start a new transaction before you update a sharding key, because a distributed transaction has a higher risk than local transaction in the case of a remote machine or network failure. To avoid race condition, the rows being updated are locked before starting the insertion-deletion operation.Note:
This feature is not supported when Raft replication is configured.Use Case
Sometimes a sharding key value for a particular record must be updated. For example, employee location can be one of the sharding keys. When employees move from one country to another, their country value must be updated, in which case the data must be moved to the shard mapped to the new key value.
What happens when I update a sharding key?
When a record’s sharding key value is updated in a sharded table, the record could end up in three possible locations after the update, depending on which shard the new value is mapped to.
1. The row stays within the same partition.
2. The row is moved to a different partition in the same shard.
3. The row is moved to a different shard.
When the sharding key value on a particular row of a sharded table is updated, Oracle Globally Distributed Database handles moving the data to a new location, whether it is in a different partition on the same shard or on a different shard.
Support for Sharded Table Family
A table family is a parent-child relationship between database tables.
Multiple tables linked by such relationships typically form a tree-like hierarchy
where every child has a single parent. A table family can be defined using reference
partitioning or the PARENT
clause.
Support for automatic row movement in a table family which was created with the
PARENT
clause, requires primary key-foreign key constraints to
be added between the parent and child table.
In a table family which was created with reference partitioning, the primary key-foreign key constraints between the parent and child table already exist, so automatic row movement is supported and no extra step is needed.
Enable and Disable Automatic Data Movement
This operation uses the ROW MOVEMENT
clause on the
sharded table in the database.
ALTER TABLE tablename ENABLE ROW MOVEMENT;
ALTER TABLE tablename DISABLE ROW MOVEMENT;
ROW MOVEMENT
can also be specified on CREATE
TABLE
.
When ROW MOVEMENT
is enabled and there is an update to
the sharding key value, the data is transparently moved between shards.
For details about the ROW MOVEMENT
clause, see row_movement_clause in Oracle Database SQL
Language Reference.
Enable Automatic Data Movement in a Table Family
In the case of a table family, ROW MOVEMENT
must be enabled on the
child table first, and then on the parent table.
In the example below, accounts
is the root table,
orders
is a child of accounts
,
lineitems
is a child of orders
.
ALTER TABLE lineitems ENABLE ROW MOVEMENT;
ALTER TABLE orders ENABLE ROW MOVEMENT;
ALTER TABLE accounts ENABLE ROW MOVEMENT;