Using a JSON Field as a Sharding Key

You can designate a JSON field to be the sharding key when creating your distributed database schema.

The examples in the topics that follow show you how to create a sharded table for the JSON collection, create the SODA mapping, trigger the sharding key column population, and access the sharded table from an application with Java and Python code samples.

Creating a Sharded Table for the JSON Collection

To create a sharded table that uses a sharding key other than the SODA ID:

  1. Connect to the shard catalog as the Sharding/SODA user.
  2. Enable SHARD DDL.
  3. Create a tablespace set.
  4. Run CREATE SHARDED TABLE, as shown in the example below.

The following examples create a sharded table (Customers) for a JSON collection of customer profile documents (CUSTPROFILE).

A column for the SODA ID (ID) identifies the JSON entries. When creating a JSON entry in the table with SODA, the application populates the ID column with a unique value.

A sharding key column (ZIP) is the ZIP code value extracted from the JSON document.

The other columns are the default column names given when SODA creates a table to hold an underlying collection. You can see this for yourself when creating a SODA collection and then examining the created table.

Note that the ID column by itself cannot be the primary key. The PK must be or must include the sharding key, in this case ZIP. In the application examples, both ID and ZIP are used to work with the sharded data. In the example above the PK consists of the sharding key and the SODA ID (ZIP, ID), because ZIP will not be a unique value by itself.

Note that in Oracle 21c, you can use either (ZIP, ID) or (ID, ZIP) as the combined Primary Key. In general, you should expect access to this table to be for these values individually, not as a combination. SODA access for these examples looking for ID and customer queries might be using the JSON field (ZIP in this case), so you will create individual indexes in any case. .

Choosing a good sharding key depends on the usage and application requirements. You can use a unique sharding key, for example a Customer ID, but in that case you could also use the SODA ID to store the sharding key.

Creating a Sharded Table: System-Managed

/* Enable shard DDL */
ALTER SESSION ENABLE SHARD DDL;

/* Create a tablespace set */
CREATE TABLESPACE SET TSP_SET_1 USING TEMPLATE
 (datafile size 100m autoextend on next 10M maxsize unlimited
 extent management local segment space management auto); 

/* Create the sharded table */
CREATE SHARDED TABLE CUSTOMERS
(
"ID" VARCHAR2(255) NOT NULL,
"CREATED_ON" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"LAST_MODIFIED" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"VERSION" varchar2(255) NOT NULL,
"ZIP" VARCHAR2(60) NOT NULL,
"CUSTPROFILE" JSON,
PRIMARY KEY (ID,ZIP))
TABLESPACE SET TSP_SET_1
PARTITION BY CONSISTENT HASH (ZIP) PARTITIONS AUTO;

Creating a Sharded Table: User-Defined

Ensure that all of the necessary tablespaces and shardspaces have been created.

/* Enable shard DDL */
ALTER SESSION ENABLE SHARD DDL;

/* Create the sharded table */
CREATE SHARDED TABLE CUSTOMERS
(
"ID" VARCHAR2(255) NOT NULL,
"CREATED_ON" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"LAST_MODIFIED" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"VERSION" varchar2(255) NOT NULL,
"ZIP" VARCHAR2(60) NOT NULL,
"CUSTPROFILE" JSON,
PRIMARY KEY (ID,ZIP))
PARTITION BY RANGE (ZIP)
(PARTITION p1 VALUES LESS THAN ('50000') TABLESPACE ts1,
PARTITION p2 VALUES LESS THAN ('99999') TABLESPACE ts2)

Creating a Mapped SODA Collection on the Sharded Table

Create a map to let SODA know which columns to use when working with the sharded table, and add the sharded table to the list of collections.

You can run a procedure to create the map, but this procedure also must be run on ALL of the shards in the distributed database. The procedure also needs to be run on any shards added in the future. You can accomplish both of these requirements using a sharding-specific PL/SQL procedure, sys.exec_shard_plsql().

To create a SODA map across all shards:

As the Sharding/SODA user and with SHARD DDL enabled, run the following commands on the shard catalog. The shard catalog propagates the procedure to all of the shards to be processed automatically.

create or replace procedure COLLECTION_PROC_CUSTOMERS AS
METADATA varchar2(8000);
COL SODA_COLLECTION_T;
begin
METADATA := '{"tableName":"CUSTOMERS",
"keyColumn":{"name":"ID"},
"contentColumn":{"name":"CUSTPROFILE","sqlType":"JSON"},
"versionColumn":{"name":"VERSION","method":"UUID"},
"lastModifiedColumn":{"name":"LAST_MODIFIED"},
"creationTimeColumn":{"name":"CREATED_ON"},
"readOnly":false}';
 -- Create a collection using "map" mode, based on
 -- the table you've created above and specified in 
 -- the custom metadata under "tableName" field.
COL := dbms_soda.create_collection('CUSTOMERS',METADATA,DBMS_SODA.CREATE_MODE_MAP);
end ;
/
 
exec sys.exec_shard_plsql('app_schema.collection_proc_customers()',4+1);

Note that the keyColumn is ID, the key used by SODA to insert and retrieve collections. There is no reference to the ZIP column because it is not used by SODA in the mapping.

At this point, a new collection has been created just as if you had run a CREATE COLLECTION command.

You can run some PL/SQL to list out the collections. On the shard catalog, run the following command, and verify that the output lists the Customers table.

SET SERVEROUTPUT ON
DECLARE
l_coll_list SODA_COLLNAME_LIST_T;
BEGIN
l_coll_list := DBMS_SODA.list_collection_names;
 
IF l_coll_list.COUNT > 0 THEN
FOR i IN 1 .. l_coll_list.COUNT LOOP
DBMS_OUTPUT.put_line(i || ' : ' || l_coll_list(i));
END LOOP;
END IF;
END;
/
1 : CUSTOMERS
 
PL/SQL procedure successfully completed.
 
SQL>

Creating a Trigger to Populate the Sharding Key

When SODA inserts or updates the document, it automatically populates the underlying table columns described in the collection metadata (that is ID, CUSTPROFILE, LAST_MODIFIED, CREATED_ON, and VERSION). However, you also need to populate the ZIP column, and the value must come from within the JSON document. This is accomplished using a trigger.

Note that this is a BEFORE trigger, which allows you to populate a column even when that column is the primary key.

Run the following statements on the shard catalog as the application schema user. The procedure sys.exec_shard_plsql ensures that it is also run on all shards and all future shards.

alter session enable shard ddl
 
create or replace procedure COLLECTION_BF_ZIP_CUSTOMERS AS
begin
EXECUTE IMMEDIATE 'alter session enable shard operations';
EXECUTE IMMEDIATE q'%
Create or Replace TRIGGER CUST_BF_TRIG
BEFORE INSERT or UPDATE on CUSTOMERS
FOR EACH ROW
begin
:new.ZIP := JSON_VALUE(:NEW.CUSTPROFILE, '$.ZIP' error on error error on empty);
end;
%';
end;
/

exec sys.exec_shard_plsql('app_schema.collection_bf_zip_customers()',4+1+2);

In the example above, ZIP is assumed to be a top-level field in the JSON document. If the value is in a nested field, for example under an ADDRESS field, you must include the field hierarchy, for example '$.ADDRESS.ZIP'.

Code Samples

The Java and Python code samples for "Using a JSON Field as the Sharding Key" demonstrate how you can actually work with JSON objects using SODA in a distributed database environment.

In these examples, you connect to a shard using the sharding key and insert a new document.

Note that when using SODA in a distributed database environment, new documents should be created by connecting to specific shards and not using the shard catalog.

Java Code Sample

The Java code sample below shows you how to insert JSON documents in a collection where the data is sharded by a JSON field, ZIP code in this example.

import java.sql.Connection;
import java.util.Properties;
import java.util.List;
 
// SODA specific imports
import oracle.soda.rdbms.OracleRDBMSClient;
import oracle.soda.OracleDatabase;
import oracle.soda.OracleCursor;
import oracle.soda.OracleCollection;
import oracle.soda.OracleDocument;
import oracle.soda.OracleException;
 
// Sharding and UCP imports
import oracle.jdbc.OracleShardingKey;
import oracle.jdbc.OracleType;
import oracle.jdbc.pool.OracleDataSource;
import oracle.ucp.jdbc.PoolDataSourceFactory;
import oracle.ucp.jdbc.PoolDataSource;
 
 
/*
* The sample demonstrates connecting to a distributed database using
* Oracle JDBC driver and UCP as a client side connection pool.
*/
public class QuickInsertShardJSONField {
 
 public static void main(String args[]) throws Exception {
 
// TNS_ADMIN - Should be the path where the tnsnames.ora file resides
// dbshard_rw - It is the TNS alias present in tnsnames.ora.
// Note that the connection is to the Shard Director (GSM) and the service name is the shard RW service
final String DB_URL="jdbc:oracle:thin:@dbshard_rw?TNS_ADMIN=/home/opc/dbhome/";
 
// Update the Database Username and Password to the Shard User
final String DB_USER = "app_schema";
String DB_PASSWORD = "<user_password>" ;
 
 
// Get the PoolDataSource for UCP
PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource();
 
// Set the connection factory first before all other properties
pds.setConnectionFactoryClassName(OracleDataSource.class.getName());
pds.setURL(DB_URL);
pds.setUser(DB_USER);
pds.setPassword(DB_PASSWORD);
pds.setConnectionPoolName("JDBC_UCP_POOL");
 
// Default is 0. Set the initial number of connections to be created
// when UCP is started.
pds.setInitialPoolSize(10);
// Default is 0. Set the minimum number of connections
// that is maintained by UCP at runtime.
pds.setMinPoolSize(10);
// Instead of Max Pool Size, we can set the number of max connections per shard
pds.setMaxConnectionsPerShard(20);
 
// We cannot get the connection until we have the Shard key which is part of the SQL
//We first set the sharding key which in our case is the value of the ZIP code field
String shardingKeyVal="94065";
 
// Now we build the connection using this shard key
OracleShardingKey sdkey = pds.createShardingKeyBuilder().subkey(shardingKeyVal, OracleType.VARCHAR2).build();
System.out.println("Initiating UCP and Creating Connection...");
Connection conn = pds.createConnectionBuilder().shardingKey(sdkey).build();
 
// Enable the SODA Shared Metadata cache
Properties props = new Properties();
props.put("oracle.soda.sharedMetadataCache", "true");
OracleRDBMSClient cl = new OracleRDBMSClient(props);
 
// Get a DB Connection
OracleDatabase db = cl.getDatabase(conn);
 
// Print all the Collections in this DB
List<String> names =  db.admin().getCollectionNames();
for (String name : names)
 System.out.println ("Collection name: " + name);
 
// Open up the CUSTOMERS Collection
OracleCollection col = db.openCollection("CUSTOMERS");
 
//We do not provide an SODA ID column.
//This is provided by SODA when the document is created
// Note that the ZIP field MUST match what we have specified as the key
OracleDocument cDoc = db.createDocumentFromString("{\"name\": \"Matilda\", \"State\": \"CA\", \"ZIP\":\"94065\"}");
 
// Insert the document above
OracleDocument insertedDoc = col.insertAndGet(cDoc);
 
// Get the document key
String dockey = insertedDoc.getKey();
 
// Get the document back by key
// We are still connected to the same shard
OracleDocument doc = col.find().key(dockey).getOne();
String content = doc.getContentAsString();
System.out.println("Retrieved content is: " + content);
 
// We are done, so close the connection to the shard
conn.close();
 
// At this point we could open up a new shard connection using a different sharding key
 
 
 }} // End of QuickInsertShardJSONField

Python Code Sample

This code sample in Python shows how you can actually work with JSON objects using SODA in a distributed database environment.

To use this sample code in your environment, follow the instructions to install the cx_Oracle module for Python: https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html

In this example, you connect to a shard using the sharding key and insert a new document.

Note that when using SODA in a distributed database environment, new documents should be created by connecting to specific shards and not using the shard catalog.

# import the cx_Oracle module for Python
import cx_Oracle
 
# Create a connection pool that will be used for connecting to all shards
# The components of the dsn are hostname (shard director),
# port (usually 1522), global service (created using GDSCTL)
# We also enable SODA metadata caching
dsn=cx_Oracle.makedsn("shard_director_host",1522,service_name="service_name")
pool=cx_Oracle.SessionPool("app_schema","password",dsn,soda_metadata_cache=True)

# Connect to a specific shard by using the shard key, a ZIP code. which in this
# example is set explicitly as '94065', but this might be passed in or part of a loop 
# You must know beforehand whether you are creating or working with a document
# with a specific ZIP code value.
connection=pool.acquire(shardingkey=["94065"])
 
# set autocommit and open the CUSTOMERS collection
connection.autocommit = True
soda = connection.getSodaDatabase()
collection = soda.openCollection("CUSTOMERS")
 
# Insert a document
# A system generated SODA key is created by default.
content = {'name': 'Matilda', 'STATE': 'CA', 'ZIP': '94065'}
doc = collection.insertOneAndGet(content)
 
# The SODA key can now be used to work with this document directly
# We can retrieve it immediately
key = doc.key
print('The key of the new SODA document is: ', key)
 
# Fetch the document back by this same SODA key. 
# This only works because we are still connected to the same shard
doc = collection.find().key(key).getOne()
content = doc.getContent()
print('Retrieved SODA document dictionary is:')
print(content)
 

# Next, add another customer. We are in the shard containing 94065,
# so we can add a customer with the same ZIP code '94065'
content = {'name': 'Mildred', 'STATE': 'CA', 'ZIP: '94065'}
doc = collection.insertOneAndGet(content)
 
# Now do a query.
# It is important to note that this query is ONLY executed within this one shard,
# the shard which contains the part of the sharded table with 94065 ZIP codes.
# In other words, the actual query has the additional bound of customers whose
# names start with 'M' in 94065
# and any other ZIPs stored on this shard. This is unlikely to be a useful query
# for system-managed sharding.
documents = collection.find().filter({'name': {'$like': 'M%'}}).getDocuments()
for d in documents:
    content = d.getContent()
    print(content["name"])
 
# After you have finished, release this connection back into the pool
pool.release(connection)
 
# If you want to add or work with more customers with a different
# shard key start with another connection
# For example: connection=pool.acquire(shardingkey=["10012"]) and so on.
 
# When you are completely finished working with customers, shut down the pool.
pool.close()

This code sample shows you how to run a multi-shard query to return all customer names in all shards whose names begin with an "M".

import cx_Oracle
 
# Create an unpooled connection to the shard catalog
# The connect string connects to the shard director, but uses the catalog service,
# e.g. GD$catalog.oradbcloud
connection = cx_Oracle.connect("app_schema","password","db_connect_string")

# Open the CUSTOMERS collection
connection.autocommit = True
soda = connection.getSodaDatabase()
collection = soda.openCollection("CUSTOMERS")
  
# Now query the collection
# It is important to note that this is a query across ALL shards
# In other words, you will get ALL users whose name starts with M across ALL Zip codes
documents = collection.find().filter({'name': {'$like': 'M%'}}).getDocuments()
for d in documents:
    content = d.getContent()
    print(content["name"])
  
#Close the connection
connection.close()