Oracle AI Vector Search in a Distributed Database
Oracle Globally Distributed Database support for AI Vector Search includes most of the distributed database functionality.
The support includes the following:
-
Creation of sharded and duplicated tables with vector data type columns.
-
Creation of vector indexes on sharded and duplicated tables, including Inverted File Flat (IVF) index and Hierarchical Navigable Small World (HNSW) index.
-
DMLs can be issued from the shard catalog on sharded tables and duplicated tables with vector data types.
-
DMLs can be issued from shards on duplicated tables with vector datatype columns.
-
Vector search queries on sharded tables and duplicated tables can be issued from the shard catalog or from the shards using the direct routing capability.
-
Vector search queries issued on the shard catalog are analyzed and transformed to identify the part of the query that will be sent to the shards and the part that needs to be run on the catalog.
-
Vector search is supported with all types of data distribution: system sharding, user-defined sharding, composite sharding, and directory based sharding.
-
The procedures in the packages
DBMS_VECTOR
andDBMS_VECTOR_CHAIN
are supported in Globally Distributed Database.
There are some limitations:
-
Sharding keys: Globally Distributed Database only supports sharding keys on non-vector columns. The vector data can be distributed across shards using a primary key on any other non-vector column identified as a sharding key.
-
Raft replication: A distributed database using the Raft replication method does not support vector columns.
AI Vector Search can benefit from what a distributed database has to offer, that is, to distribute data across several databases to:
-
Comply with data sovereignty regulations
-
Reduce the risk of unavailability of all the data
-
Allow the scalability by increasing the throughput and reducing latency
Vectors in Distributed Database Tables
There is no new SQL syntax or keyword when creating sharded tables and duplicated tables with vector columns in a Globally Distributed Database; however, there are some requirements and restrictions to consider.
User Permissions
Only an all-shards user can create sharded and duplicated tables. You must connect to
the shard catalog as an all-shards user. Connecting to the shard catalog as an
all-shards user automatically enables SHARD DDL
, and the DDL to
create the tables is propagated to all the shards in the distributed database.
Creating Sharded Tables with a Vector Column
-
Sharded tables must be created on the catalog database with
SHARD DDL
enabled. -
A vector column cannot be part of the sharding key or the partitionset key.
-
The
CREATE SHARDED TABLE
command is propagated to all of the shards by the shard coordinator.
The syntax to create a sharded table with a vector column is same as the syntax to
create a non-sharded table with a vector column. The only difference is to include
the SHARDED
keyword in the CREATE TABLE
statement.
CREATE SHARDED TABLE REALTORS(
REALTOR_ID NUMBER PRIMARY KEY,
NAME VARCHAR2(20),
IMAGE VECTOR,
ZIPCODE VARCHAR2(40))
PARTITION BY CONSISTENT HASH(REALTOR_ID)
TABLESPACE SET TS1;
Creating Duplicated Tables with a Vector Column
-
Duplicated tables must be created on the shard catalog database with
SHARD DDL
enabled.
The syntax to create a duplicated table with a vector column is same as the syntax to
create a non-sharded table with a vector column. The only difference is to include
the DUPLICATED
keyword in the CREATE TABLE
statement.
CREATE DUPLICATED TABLE PRODUCT_DESCRIPTIONS
(
PRODUCT_ID NUMBER(6,0) NOT NULL,
ORDER_ID NUMBER(6,0) NOT NULL,
LANGUAGE_ID VARCHAR2(6 BYTE),
TRANSLATED_NAME NVARCHAR2(50),
TRANSLATED_DESCRIPTION NVARCHAR2(2000),
VECT4 VECTOR,
VECT5 VECTOR,
CONSTRAINT PRODUCT_DESCRIPTIONS_PK primary key (PRODUCT_ID)
) tablespace products
STORAGE (INITIAL 1M NEXT 1M);
Vector Indexes in a Globally Distributed Database
Inverted File Flat (IVF) index and Hierarchical Navigable Small World (HNSW) index are supported on sharded tables in a distributed database; however there are some considerations.
Note:
-
Global indexes are not supported on sharded tables; however, this limitation does not exist for the global HNSW and IVF index.
-
Hybrid Vector Indexes (HVI) are not currently supported on sharded tables.
Inverted File Flat Index
Inverted File Flat Index (IVF Flat or simply IVF) is a partitioned-based index that lets you balance high-search quality with reasonable speed.
You can create a local IVF index on vector columns in a sharded table.
There is no syntax change required. However, you must create the index partitions,
partitions of $IVF_FLAT_CENTROIDS
and
$IVF_FLAT_CENTROID_PARTITIONS
in relevant chunk tablespaces to
facilitate move chunks across shards.
-
IVF indexes and HNSW indexes on a sharded table must be created on the shard catalog database with
SHARD DDL
enabled. -
The
CREATE INDEX
command is propagated as is to all of the shards by the shard coordinator. TheCREATE INDEX
clause scope is the shard.
There is no syntax change to create an IVF index on a sharded table, when compared to the syntax to create an IVF index on a non-sharded table.
CREATE VECTOR INDEX ivf_image
ON houses (image)
ORGANIZATION NEIGHBOR PARTITIONS WITH TARGET ACCURACY 95
DISTANCE EUCLIDEAN PARAMETERS
(type IVF, NEIGHBOR PARTITIONS 1000) PARALLEL 16;
Hierarchical Navigable Small World Index
There is no syntax change to create a Hierarchical Navigable Small World (HNSW) index on a sharded table, when compared to the syntax to create an HNSW index on a non-sharded table.
CREATE VECTOR INDEX hnsw_image
ON houses (image)
ORGANIZATION INMEMORY NEIGHBOR GRAPH
WITH TARGET ACCURACY 95;