Artificial Intelligence Vectors and Semantic Search
Oracle Database 23ai introduces semantic search capabilities using Artificial Intelligence (AI) vector search. These capabilities include a new vector data type, vector indexes, and vector search SQL operators that allow the database to store semantic document content, images, and other unstructured data as vectors and then run fast performing similarity queries. The key innovation is that the database better understands user intent and the search context to find similar matches, rather than only find exact matches.
Building upon this capability, the Oracle database can use Generative AI constructs, such as Retrieval Augmented Generation (RAG) to combine large language models (LLMs) and private business data to respond to natural language questions. RAG provides higher accuracy and avoids exposing any of the private data to the LLM training data.
Oracle Database Vector Data Types
There are two Oracle database vector types: dense and sparse. Dense vectors were the first AI vector type Oracle introduced in Oracle Database 23ai. They are ideal for data sets in which nearly every dimension stores a non-zero value, meaning they hold much rich and complex information, patterns, and relationships. A great use case for dense vectors is semantic search when word meaning nuances better identify matches.
On the other hand, sparse vectors typically have a large number of dimensions and few non-zero dimension values. To improve efficiency, they only store non-zero values. That makes sparse vectors more storage and memory efficient. Moreover, ODP.NET can process them faster than dense vectors assuming the data profile has few non-zero dimension values. They are suited for other AI application types, such as text retrieval systems in which a search phrase identifies relevant matching documents.
ODP.NET Vectors
Starting with version 23.3.2, managed ODP.NET and ODP.NET Core support Oracle database vector data access as existing .NET or ODP.NET data types: array
s, string
s, CLOB
s, and VARCHAR2
s. In managed ODP.NET and ODP.NET Core 23.8, new ODP.NET vector-specific types for dense vectors (OracleVector
) and sparse vectors (OracleSparseVector
) are introduced for richer user interfaces that more naturally and easily operate on vectors.
ODP.NET vector data types are flexible to use. They have one or more dimensions, up to 65,536 (64K) dimensions. Developers can define vectors with a fixed or variable number of dimensions. The numeric format of each dimension in a vector can be of the same numeric format or a different numeric format for each dimension:
Table 3-28 Dense Vector Numeric Formats
Vector Numeric Format | Description | Mapped .NET Type | Mapped ODP.NET Type | Size | Examples |
---|---|---|---|---|---|
|
packed UINT8 bytes where each dimension is a single bit |
|
|
1 byte |
|
|
8-bit integers |
|
|
2 bytes |
|
|
32-bit floating point |
|
|
4 bytes |
|
|
64-bit floating point |
|
|
8 bytes |
|
For example, a one dimension INT8 vector would be Vector(1, INT8). A four dimension FLOAT32 vector would be Vector(4, FLOAT32). A variable dimension FLOAT64 vector would be Vector(*, FLOAT64).
To specify that each vector dimension can be either an INT8, FLOAT32, or FLOAT64 data type, use the * notation, such as Vector(4, *) to represent a four dimensional vector. You may specify a vector as Vector(*, *) to have a variable number of dimensions and variable data type vector.
Table 3-29 Sparse Vector Numeric Formats
Vector Numeric Format | Description | Mapped .NET Type | Mapped ODP.NET Type | Size | Example |
---|---|---|---|---|---|
|
8-bit integers |
|
|
2 bytes |
|
|
32-bit integers |
|
|
4 bytes |
|
|
64-bit integers |
|
|
8 bytes |
|
ODP.NET OracleVector
(dense) can use all four numeric formats. ODP.NET OracleSparseVector
(sparse) can use all except BINARY
.
ODP.NET can retrieve the database vector data type in .NET as any of the following:
Data Type | ODP.NET or .NET Data Type | Description |
---|---|---|
.NET numeric array |
.NET |
This array type can be |
.NET byte array |
.NET |
Only Oracle dense vectors can be retrieved as .NET byte arrays. Oracle Note:
|
.NET string |
.NET |
This contains the vector representation in JSON format. |
|
ODP.NET |
This contains the vector representation in JSON format. |
|
ODP.NET |
This contains the vector representation in JSON format. |
|
ODP.NET |
ODP.NET native dense vector data type. This is the base type for all ODP.NET native vector data types. |
|
ODP.NET |
ODP.NET native sparse vector data type. |
ODP.NET can bind vector data as the following .NET types for SQL or stored procedure execution:
Data Type | .NET or ODP.NET Data Type | Bind Type | Description |
---|---|---|---|
|
ODP.NET |
|
ODP.NET native vector data types. ODP.NET will bind the Vector data type as the |
|
ODP.NET |
|
ODP.NET native dense vector data type. ODP.NET will bind the Vector data type based on the binding type accordingly. |
|
ODP.NET |
|
ODP.NET native sparse vector data type. ODP.NET will bind the Vector data type based on the binding type accordingly. |
Any numeric array |
.NET |
|
This array type can be any numeric array type. ODP.NET will infer the numeric array type and bind the Vector data type as |
Any numeric array |
.NET |
|
This array type can be any numeric array type. ODP.NET will bind the Vector data type based on the binding type accordingly. |
.NET byte array |
.NET |
|
All vector numeric formats can be bound as .NET byte arrays. |
.NET string, OracleString |
.NET or ODP.NET |
|
This contains the vector representation in JSON format. ODP.NET will bind the Vector data type as |
.NET string, OracleString |
.NET or ODP.NET |
|
This contains the vector representation in JSON format. ODP.NET will bind the Vector data type based on the binding type accordingly. |
OracleClob |
ODP.NET |
|
This contains the vector representation in JSON format. |
Note:
Sparse vectors do not support binding as OracleDbType.Vector_Binary
nor .NET byte array types.
When using vectors with OracleDataAdapter ReturnProviderSpecificTypes
set to false
, for dense VECTORs, a .NET numeric array will populate the DataTable
or DataSet
after the Fill
method is called; and for sparse VECTORs, a .NET string representing the sparse vector will be used. When true
, OracleVector
or OracleSparseVector
will be used.
For dense VECTOR columns of '*
' numeric format and ReturnProviderSpecificTypes
set to false
, a mix of short[]
, float[]
, double[]
, and/or byte[]
will be populated based on the VECTOR’s numeric format upon an OracleDataAdapter
Fill()
. For dense VECTOR columns of all other formats, short[]
, float[]
, double[]
, or byte[]
will be populated based on the VECTOR column's numeric format.
Since vectors cannot be compared directly with each other, they cannot be used as JOIN keys, ORDER BY keys, GROUP BY keys, or other related scenarios. Thus, for OracleCommandBuilder
, vector comparison is not included in the SQL WHERE clause of the generated commands.
Vectors as CLOB or VARCHAR2 Data Type
Unmanaged ODP.NET, as well as managed ODP.NET and ODP.NET Core prior to version 23.3.2, support vectors using existing CLOB
or VARCHAR2
data types. In these cases, vectors are stored with a JSON format. Data retrieval and manipulation use existing CLOB
or VARCHAR2
or .NET APIs.
In later managed ODP.NET and ODP.NET Core versions, vectors can remain stored as CLOB
s or VARCHAR2
s for backwards compatibility purposes by setting the OracleConfiguration
MapVectorColumnAsClob
property to true.
When sparse vectors are sent to ODP.NET versions that do not support sparse vectors natively, the vector data becomes CLOB
data in JSON dense vector format.
ODP.NET Vector Sample Code
//This ODP.NET artificial intelligence (AI) vector sample demonstrates how to insert,
retrieve, update, and delete multi-dimensional FLOAT64, FLOAT32, and INT8 vector data
types using the Oracle database.
//Requires ODP.NET 23ai (23.3.2) or higher and Oracle Database 23ai (23.4) or higher.
//Add User Id, Password, and Data Source, such as Easy Connect Plus or TNS, to the connection
string to connect to the DB.
using Oracle.ManagedDataAccess.Client;
namespace VectorDemo
{
public class VectorDemo
{
//Provide User Id, Password, and Data Source values for your database.
public const string conStr = "User Id=<USER>;Password=<PASSWORD>;Data Source=<DATA SOURCE>;";
public const int id = 1;
public static void Main(string[] args)
{
using (OracleConnection con = new OracleConnection(conStr))
{
using (OracleCommand cmd = con.CreateCommand())
{
try
{
con.Open();
cmd.CommandText = "begin " +
"execute immediate 'drop table VectorTable';" +
"exception when others then if sqlcode <> -942 then raise;" +
"end if;" +
"end;";
cmd.ExecuteNonQuery();
//Create table with 2 dimensional FLOAT64, 3 dimensional FLOAT32, and 4 dimensional INT8 vector columns
cmd.CommandText = "create table VectorTable (id number, float64s vector(2, float64),
float32s vector(3, float32), int8s vector(4, INT8), constraint pk primary key (id))";
cmd.ExecuteNonQuery();
//Insert vector row into DB
InsertVectors();
//Update vector values in DB
UpdateVectors();
//Delete vector values in DB
DeleteVectors();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
}
public static void InsertVectors()
{
string sql = "insert into VectorTable values (:id, :float64, :float32, :int8)";
try
{
double[] doubles = new double[] { 1.234, 2.345 };
float[] floats = new float[] { 1.23f, 2.34f, 3.45f };
short[] int16s = new short[] { 1, 2, 3, 4 };
OracleConnection con = new OracleConnection(conStr);
con.Open();
OracleCommand cmd = new OracleCommand(sql, con);
cmd.Parameters.Add("id", OracleDbType.Int16, 0, id, System.Data.ParameterDirection.Input);
//Binding FLOAT64 variable as vector - doubles
cmd.Parameters.Add("float64", OracleDbType.Vector, 0, doubles, System.Data.ParameterDirection.Input);
//Binding FLOAT32 variable as vector - floats
cmd.Parameters.Add("float32", OracleDbType.Vector, 0, floats, System.Data.ParameterDirection.Input);
//Binding INT8 variable as vector - int16s
cmd.Parameters.Add("int8", OracleDbType.Vector, 0, int16s, System.Data.ParameterDirection.Input);
// Insert vectors into VectorTable
cmd.ExecuteNonQuery();
//Retrieve vector values from DB
RetrieveVectors();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
public static void UpdateVectors()
{
string sql = "update VectorTable set float64s=:float64, float32s=:float32, int8s=:int8 where id=:id";
try
{
double[] doubles = new double[] { 9.876, 8.765 };
float[] floats = new float[] { 9.87f, 8.76f, 7.65f };
short[] int16s = new short[] { 9, 8, 7, 6 };
OracleConnection con = new OracleConnection(conStr);
con.Open();
OracleCommand cmd = new OracleCommand(sql, con);
//Binding FLOAT64 variable as vector - doubles
cmd.Parameters.Add("float64", OracleDbType.Vector_Float64, 0, doubles, System.Data.ParameterDirection.Input);
//Binding FLOAT32 variable as vector - floats
cmd.Parameters.Add("float32", OracleDbType.Vector_Float32, 0, floats, System.Data.ParameterDirection.Input);
//Binding INT8 variable as vector - int16s
cmd.Parameters.Add("int8", OracleDbType.Vector_Int8, 0, int16s, System.Data.ParameterDirection.Input);
cmd.Parameters.Add("id", OracleDbType.Int16, 0, id, System.Data.ParameterDirection.Input);
// Update vectors in VectorTable
cmd.ExecuteNonQuery();
Console.WriteLine("Database vector values updated!");
//Retrieve vector values from DB
RetrieveVectors();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
public static void DeleteVectors()
{
string sql = "delete from VectorTable where id=:id";
try
{
OracleConnection con = new OracleConnection(conStr);
con.Open();
OracleCommand cmd = new OracleCommand(sql, con);
cmd.Parameters.Add("id", OracleDbType.Int16, 0, id, System.Data.ParameterDirection.Input);
// Delete row with vector values from VectorTable
cmd.ExecuteNonQuery();
Console.WriteLine("Database vector values deleted!");
//Confirm vectors removed
RetrieveVectors();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
public static void RetrieveVectors()
{
string sql = "select * from VectorTable";
try
{
OracleConnection con = new OracleConnection(conStr);
con.Open();
OracleCommand cmd = new OracleCommand(sql, con);
OracleDataReader reader = cmd.ExecuteReader();
//Use ODP.NET vector accessors to retrieve data
if (reader.Read())
{
Console.WriteLine("Retrieve FLOAT64S Vector value:");
double[] vecD = reader.GetDoubleArray(1);
PrintDoubles(vecD);
Console.WriteLine("Retrieve FLOAT32S Vector value:");
float[] vecF = reader.GetFloatArray("FLOAT32S");
PrintFloats(vecF);
Console.WriteLine("Retrieve INT8S vector value:");
short[] vecInt8s = reader.GetInt16Array(3);
PrintInt16s(vecInt8s);
Console.WriteLine();
}
//Return no results if no vector row is found
else
Console.WriteLine("No vector row found.");
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
static public void PrintDoubles(double[] doubles)
{
Console.Write("{");
for (int i = 0; i < doubles.Length - 1; i++)
Console.Write(doubles[i].ToString() + ", ");
Console.WriteLine(doubles[doubles.Length - 1].ToString() + "}");
}
static public void PrintFloats(float[] floats)
{
Console.Write("{");
for (int i = 0; i < floats.Length - 1; i++)
Console.Write(floats[i].ToString() + ", ");
Console.WriteLine(floats[floats.Length - 1].ToString() + "}");
}
static public void PrintInt16s(Int16[] int16s)
{
Console.Write("{");
for (int i = 0; i < int16s.Length - 1; i++)
Console.Write(int16s[i].ToString() + ", ");
Console.WriteLine(int16s[int16s.Length - 1].ToString() + "}");
}
}
}