Choosing a Sharding Key

SODA collections are backed by regular Oracle tables. One of the columns in these tables is the ID column, which contains unique keys for the documents in the collection. This column can be used as the sharding key. Alternatively, you can choose a JSON field in the document content to be the sharding key.

The choice of sharding key is application dependent.

The advantages and disadvantages of each sharding key choice are listed in the sections below.

Using the SODA ID as the Sharding Key

The SODA API automatically manages a unique ID for each SODA document. This ID is used by the SODA API to create and retrieve documents within a collection.

The SODA ID must be provided manually by the application when it is used as a sharding key. This is because when creating a new document on a specific shard, the sharding key is required beforehand in order to connect to the appropriate shard. The SODA API allows for this manual (also known as CLIENT key) assignment of a SODA ID on document creation. Examples are provided in the code samples in Using SODA ID as the Sharding Key.

It is up to the application to decide if this SODA ID represents something meaningful (for example, a Customer ID) or is merely a unique Document ID. In any case, the ID must be unique. This is not a requirement imposed by Oracle Globally Distributed Database but by the SODA API.

A summary of using the SODA ID as the sharding key:

The sharding key must be unique.
The sharding key is a document ID, which can be independent of the contents of the JSON fields.
Whenever a new document is inserted, this ID must be provided by the application.

Using a JSON field as the Sharding Key

A JSON field can be used as the sharding key. This key does not need to be unique.

In this case, each document in a collection has a separate SODA ID (as required by SODA), but it is managed automatically by the SODA API as a separate document ID.

A summary of using a JSON field as the sharding key:

The sharding key does not need to be unique.
The sharding key is a field within the JSON of each document.
The SODA ID does not need to be specified when inserting a new document.

Considerations in choosing a Sharding Key method

Note that in both cases, a sharding key is a field which rarely or never changes. This might be a uniquely assigned Customer or Document ID. It can also be a non-unique ID such as a customer birth date, with day, month and year, or a postal code.

For system-managed sharding, either sharding key method is appropriate for distributing documents across shards.

For user-defined sharding, SODA ID as shard key only makes sense if the ID has a meaningful value and it makes sense to partition this by range, for example.

Given no other constraints, using a JSON field as the sharding key offers greater flexibility and allows the sharding key to be stored naturally as part of the JSON.

System-managed vs. User-defined Sharding

Although similar in many ways, user-defined sharding gives you greater control over where data resides. This can be useful when data needs to be separated geographically, or other reasons arise so that data also requires a physical mapping.

Much of the procedures and examples in later topics apply to both sharding methods. There are two exceptions:

On creation of the sharded table which underlies the SODA collection, the physical mapping for user-defined sharding must be specified. You can find an example in which a range of ZIP codes must reside on specific shards in Using a JSON Field as a Sharding Key.
SODA queries (QBEs) can rely on this data grouping to be able to perform queries on one shard which includes a range of sharding keys.

How to Implement a Solution

After choosing which type of sharding key to use, refer to the following use cases to see examples of how to create a sharded table for the JSON collection, and how to interact with the sharded table from an application.