Schema Objects

To obtain the benefits of Oracle Globally Distributed Database, the schema of a distributed database should be designed in a way that maximizes the number of database requests processed on a single shard.

Partitions, Tablespaces, and Chunks

Distribution of partitions across shards is achieved by creating partitions in tablespaces that reside on different shards.

Each partition of a sharded table is stored in a separate tablespace, making the tablespace the unit of data distribution in a distributed database.

As described in Sharded Table Family, to minimize the number of multi-shard joins, corresponding partitions of all tables in a table family are always stored in the same shard. This is guaranteed when tables in a table family are created in the same set of distributed tablespaces as shown in the syntax examples where tablespace set ts1 is used for all tables.

However, it is possible to create different tables from a table family in different tablespace sets, for example the Customers table in tablespace set ts1 and Orders in tablespace set ts2. In this case, it must be guaranteed that the tablespace that stores partition 1 of Customers always resides in the same shard as the tablespace that stores partition 1 of Orders.

To support this functionality, a set of corresponding partitions from all of the tables in a table family, called a chunk, is formed. A chunk contains a single partition from each table of a table family. This guarantees that related data from different sharded tables can be moved together. In other words, a chunk is the unit of data migration between shards. With system-managed and composite data distribution methods, the number of chunks within each shard is specified when the distributed database is created. With user-defined data distribution, the total number of chunks is equal to the number of partitions.

A chunk that contains corresponding partitions from the tables of Cutomers-Orders-LineItems schema is shown in the following figure.

Figure 2-2 Chunk as a Set of Partitions

Description of Figure 2-2 follows
Description of "Figure 2-2 Chunk as a Set of Partitions"

Each shard contains multiple chunks as shown in the following figure.

In addition to sharded tables, a shard can also contain one or more duplicated tables. Duplicated tables cannot be stored in tablespaces that are used for sharded tables.

Tablespace Sets

Oracle Globally Distributed Database creates and manages tablespaces as a unit called a TABLESPACE SET.

A distributed database configured with the system-managed and composite data distribution methods use TABLESPACE SET, while user-defined data distribution uses regular tablespaces.

A tablespace is a logical unit of data distribution in a distributed database. The distribution of partitions across shards is achieved by automatically creating partitions in tablespaces that reside on different shards.

To minimize the number of multi-shard joins, the corresponding partitions of related tables are always stored in the same shard. Each partition of a sharded table is stored in a separate tablespace.

The PARTITIONS AUTO clause specifies that the number of partitions should be automatically determined. This type of hashing provides more flexibility and efficiency in migrating data between shards, which is important for elastic scalability.

The number of tablespaces created per tablespace set is determined based on the number of chunks that were defined for the shardspace during deployment.

Note:

Only Oracle Managed Files are supported by tablespace sets.

Individual tablespaces cannot be dropped or altered independently of the entire tablespace set.

TABLESPACE SET cannot be used with the user-defined data distribution method.

Sharded Tables

A database table is split up across the shards, so that each shard contains the table with the same columns, but a different subset of rows. A table split up in this manner is called a sharded table.

The following figure shows how a set of large tables (referred to as a table family), can be horizontally partitioned across the three shards, so that each shard contains a subset of the data, indicated with red, yellow, and blue rows.

Figure 2-4 Horizontal Partitioning of a Table Across Shards



Partitions are distributed across shards at the tablespace level, based on a sharding key. Examples of keys include customer ID, account number, and country ID.

Each partition of a sharded table resides in a separate tablespace, and each tablespace is associated with a specific shard. Depending on the data distribution method, the association can be established automatically or defined by the administrator.

Even though the partitions of a sharded table reside in multiple shards, to the application, the table looks and behaves exactly the same as a partitioned table in a single database. SQL statements issued by an application never have to refer to shards or depend on the number of shards and their configuration.

The familiar SQL syntax for table partitioning specifies how rows should be partitioned across shards. For example, the following SQL statement creates a sharded table, horizontally partitioning the table across shards based on the sharding key cust_id.

CREATE SHARDED TABLE customers 
( cust_id     NUMBER NOT NULL
, name        VARCHAR2(50)
, address     VARCHAR2(250)
, region      VARCHAR2(20)
, class       VARCHAR2(3)
, signup      DATE
,CONSTRAINT cust_pk PRIMARY KEY(cust_id)
)
PARTITION BY CONSISTENT HASH (cust_id)
PARTITIONS AUTO
TABLESPACE SET ts1
;

The sharded table is partitioned by consistent hash, a special type of hash partitioning commonly used in scalable distributed systems. This technique automatically spreads tablespaces across shards to provide an even distribution of data and workload.

Note:

Global indexes on sharded tables are not supported, but local indexes are supported.

Sharded Table Family

A sharded table family is a set of tables that are sharded in the same way. Often there is a parent-child relationship between database tables with a referential constraint in a child table (foreign key) referring to the primary key of the parent table.

Multiple tables linked by such relationships typically form a tree-like structure where every child has a single parent. A set of such tables is referred to as a table family. A table in a table family that has no parent is called the root table. There can be only one root table in a table family.

How a Table Family Is Sharded

Sharding a table family is illustrated here with the Customers–Orders–LineItems schema.

Before sharding, the tables in the schema may look as shown in the examples below. The three tables have a parent-child relationship, with Customers as the root table.

Customers Table (Root) Before Sharding

CustNo    Name       Address        Location  Class
--------- ---------- -------------- --------- ------
123       Brown      100 Main St    us3       Gold
456       Jones      300 Pine Ave   us1       Silver
999       Smith      453 Cherry St  us2       Bronze

Orders Table Before Sharding

OrderNo   CustNo   OrderDate
--------- -------- -----------
4001      123      14-FEB-2013
4002      456      09-MAR-2013
4003      456      05-APR-2013
4004      123      27-MAY-2013
4005      999      01-SEP-2013

LineItems Table Before Sharding

LineNo  OrderNo  CustNo  StockNo    Quantity
------  -------  ------  -------    --------
40011   4001     123     05683022   1
40012   4001     123     45423609   4
40013   4001     123     68584904   1
40021   4002     456     05683022   1
40022   4002     456     45423509   3
40022   4003     456     80345330   16
40041   4004     123     45423509   1
40042   4004     123     68584904   2
40051   4005     999     80345330   12

The tables can be sharded by the customer number, CustNo, in the Customers table, which is the root. The shard containing data pertaining to customer 123 is shown in the following example tables.

Customers Table Shard With Customer 123 Data

CustNo    Name       Address        Location   Class
--------- ---------- -------------- ---------- ------
123       Brown      100 Main St    us3        Gold

Orders Table Shard With Customer 123 Data

OrderNo   CustNo   OrderDate
--------- -------- -----------
4001      123      14-FEB-2013
4004      123      27-MAY-2013

LineItems Table Shard With Customer 123 Data

LineNo  OrderNo  CustNo  StockNo    Quantity
------  -------  ------  -------    --------
40011   4001     123     05683022   1
40012   4001     123     45423609   4
40013   4001     123     68584904   1
40041   4004     123     45423509   1
40042   4004     123     68584904   2

Duplicated Tables

In Oracle Globally Distributed Database a table with the same contents in each shard is called a duplicated table.

A distributed database includes both sharded tables that are horizontally partitioned across shards, and duplicated tables that are replicated to all shards.

Duplicated tables are a good choice for relatively small tables that are not updated frequently, and that are often accessed together with sharded tables. For this reason, duplicated tables usually contain reference information, for example, a Stock Items table that is common to each shard.

For many applications, the number of database requests handled by a single shard can be maximized by duplicating read-only or read-mostly tables across all shards. The combination of sharded tables and duplicated tables enables all transactions associated with a sharding key to be processed by a single shard. This technique enables linear scalability and fault isolation.

As an example of the need for a duplicated table, consider the table family that is described in Sharded Table Family. This database schema might also include a Products table which contains data that is shared by all the customers in the shards that were created for this table family, and it cannot be sharded by the customer number. To prevent multi-shard queries during order processing, the entire table can be duplicated on all shards in the distributed database.

The difference between sharded tables (Customers, Orders, and Line Items) and a duplicated table (Products) is shown in the following figure.

Figure 2-5 Sharded Tables and a Duplicated Table

Description of Figure 2-5 follows
Description of "Figure 2-5 Sharded Tables and a Duplicated Table"

In the figure above, the Customers, Orders, and Line Items tables are all sharded by a Customer ID number into three shards, illustrated by the colors of the rows in each table at the top of the figure, and the corresponding color of the sharded table in each shard. The duplicated table , Products, shown in gray, is replicated to all of the shards in its entirety, shown by the arrow from the table pointing to each of the three shards.

See Creating Duplicated Tables for more information, limitations, and examples.

Non-Table Objects Created on All Shards

In addition to duplicated tables, other schema objects, such as users, roles, views, indexes, synonyms, functions, procedures, and packages, and non-schema database objects, such as tablespaces, tablespace sets, directories, and contexts, can be created on all shards in an Oracle Globally Distributed Database.

Unlike tables, which require an extra keyword in the CREATE statement—SHARDED or DUPLICATED—other objects are created on all shards using existing syntax. The only requirement is that the SHARD DDL session property must be enabled.

Note that automatic creation on all shards of the following objects is not supported in this release. These objects can be created by connecting to individual shards.

  • Cluster

  • Control file

  • Database link

  • Disk group

  • Edition

  • Flashback archive

  • Materialized zone map

  • Outline

  • Pfile

  • Profile

  • Restore point

  • Rollback segment

  • Summary

Materialized views and view logs are supported starting in Oracle Database 18c, with the following restrictions:

  • Materialized views created on sharded tables remain empty on the catalog database, while the corresponding materialized views on shards contain data from each of the individual shards.

  • Only the REFRESH COMPLETE ON DEMAND USING TRUSTED CONSTRAINTS option is supported for materialized views on sharded tables.