Oracle AI Vector Search in a Distributed Database

Oracle Globally Distributed Database support for AI Vector Search includes most of the distributed database functionality.

The support includes the following:

  • Creation of sharded and duplicated tables with vector data type columns.

  • Creation of vector indexes on sharded and duplicated tables, including Inverted File Flat (IVF) index and Hierarchical Navigable Small World (HNSW) index.

  • DMLs can be issued from the shard catalog on sharded tables and duplicated tables with vector data types.

  • DMLs can be issued from shards on duplicated tables with vector datatype columns.

  • Vector search queries on sharded tables and duplicated tables can be issued from the shard catalog or from the shards using the direct routing capability.

  • Vector search queries issued on the shard catalog are analyzed and transformed to identify the part of the query that will be sent to the shards and the part that needs to be run on the catalog.

  • Vector search is supported with all types of data distribution: system sharding, user-defined sharding, composite sharding, and directory based sharding.

  • The procedures in the packages DBMS_VECTOR and DBMS_VECTOR_CHAIN are supported in Globally Distributed Database.

There are some limitations:

  • Sharding keys: Globally Distributed Database only supports sharding keys on non-vector columns. The vector data can be distributed across shards using a primary key on any other non-vector column identified as a sharding key.

  • Raft replication: A distributed database using the Raft replication method does not support vector columns.

AI Vector Search can benefit from what a distributed database has to offer, that is, to distribute data across several databases to:

  • Comply with data sovereignty regulations

  • Reduce the risk of unavailability of all the data

  • Allow the scalability by increasing the throughput and reducing latency

Vectors in Distributed Database Tables

There is no new SQL syntax or keyword when creating sharded tables and duplicated tables with vector columns in a Globally Distributed Database; however, there are some requirements and restrictions to consider.

User Permissions

Only an all-shards user can create sharded and duplicated tables. You must connect to the shard catalog as an all-shards user. Connecting to the shard catalog as an all-shards user automatically enables SHARD DDL, and the DDL to create the tables is propagated to all the shards in the distributed database.

Creating Sharded Tables with a Vector Column

  • Sharded tables must be created on the catalog database with SHARD DDL enabled.

  • A vector column cannot be part of the sharding key or the partitionset key.

  • The CREATE SHARDED TABLE command is propagated to all of the shards by the shard coordinator.

The syntax to create a sharded table with a vector column is same as the syntax to create a non-sharded table with a vector column. The only difference is to include the SHARDED keyword in the CREATE TABLE statement.

CREATE SHARDED TABLE REALTORS(
     REALTOR_ID NUMBER PRIMARY KEY,
     NAME VARCHAR2(20),
     IMAGE VECTOR,
     ZIPCODE VARCHAR2(40)) 
PARTITION BY CONSISTENT HASH(REALTOR_ID) 
TABLESPACE SET TS1; 

Creating Duplicated Tables with a Vector Column

  • Duplicated tables must be created on the shard catalog database with SHARD DDL enabled.

The syntax to create a duplicated table with a vector column is same as the syntax to create a non-sharded table with a vector column. The only difference is to include the DUPLICATED keyword in the CREATE TABLE statement.

CREATE DUPLICATED TABLE PRODUCT_DESCRIPTIONS
     (
     PRODUCT_ID          NUMBER(6,0) NOT NULL,
     ORDER_ID            NUMBER(6,0) NOT NULL,
     LANGUAGE_ID         VARCHAR2(6 BYTE),
     TRANSLATED_NAME     NVARCHAR2(50),
     TRANSLATED_DESCRIPTION NVARCHAR2(2000),
     VECT4 VECTOR,
     VECT5 VECTOR,
     CONSTRAINT  PRODUCT_DESCRIPTIONS_PK primary key (PRODUCT_ID)
     ) tablespace products
     STORAGE (INITIAL 1M NEXT 1M);

Vector Indexes in a Globally Distributed Database

Inverted File Flat (IVF) index and Hierarchical Navigable Small World (HNSW) index are supported on sharded tables in a distributed database; however there are some considerations.

Note:

  • Global indexes are not supported on sharded tables; however, this limitation does not exist for the global HNSW and IVF index.

  • Hybrid Vector Indexes (HVI) are not currently supported on sharded tables.

Inverted File Flat Index

Inverted File Flat Index (IVF Flat or simply IVF) is a partitioned-based index that lets you balance high-search quality with reasonable speed.

You can create a local IVF index on vector columns in a sharded table. There is no syntax change required. However, you must create the index partitions, partitions of $IVF_FLAT_CENTROIDS and $IVF_FLAT_CENTROID_PARTITIONS in relevant chunk tablespaces to facilitate move chunks across shards.

  • IVF indexes and HNSW indexes on a sharded table must be created on the shard catalog database with SHARD DDL enabled.

  • The CREATE INDEX command is propagated as is to all of the shards by the shard coordinator. The CREATE INDEX clause scope is the shard.

There is no syntax change to create an IVF index on a sharded table, when compared to the syntax to create an IVF index on a non-sharded table.

CREATE VECTOR INDEX ivf_image 
     ON houses (image) 
 ORGANIZATION NEIGHBOR PARTITIONS WITH TARGET ACCURACY 95 
 DISTANCE EUCLIDEAN PARAMETERS 
 (type IVF, NEIGHBOR PARTITIONS 1000) PARALLEL 16;

Hierarchical Navigable Small World Index

There is no syntax change to create a Hierarchical Navigable Small World (HNSW) index on a sharded table, when compared to the syntax to create an HNSW index on a non-sharded table.

CREATE VECTOR INDEX hnsw_image 
     ON houses (image) 
 ORGANIZATION INMEMORY NEIGHBOR GRAPH
 WITH TARGET ACCURACY 95;