Perform Exact Similarity Search
A similarity search looks for the relative order of vectors compared to a query vector. Naturally, the comparison is done using a particular distance metric but what is important is the result set of your top closest vectors, not the distance between them.
As an example, and given a certain query vector, you can calculate its distance to all other vectors in your data set. This type of search, also called flat search, or exact search, produces the most accurate results with perfect search quality. However, this comes at the cost of significant search times. This is illustrated by the following diagrams:
With an exact search, you compare the query vector vq
against every other vector in your space by calculating its distance to each vector. After calculating all of these distances, the search returns the nearest k
of those as the nearest matches. This is called a k-nearest neighbors (kNN) search.
For example, the Euclidean similarity search involves retrieving the top-k nearest
vectors in your space relative to the Euclidean distance metric and a query vector.
Here's an example that retrieves the top 10 vectors from the vector_tab
table that are the nearest to query_vector
using the following exact
similarity search query:
SELECT docID
FROM vector_tab
ORDER BY VECTOR_DISTANCE( embedding, :query_vector, EUCLIDEAN )
FETCH EXACT FIRST 10 ROWS ONLY;
In this example, docID
and embedding
are columns defined in the vector_tab
table and embedding
has the VECTOR
data type.
In the case of Euclidean distances, comparing squared distances is equivalent to comparing distances. So, when ordering is more important than the distance values themselves, the Euclidean Squared distance is very useful as it is faster to calculate than the Euclidean distance (avoiding the square-root calculation). Consequently, it is simpler and faster to rewrite the query like this:
SELECT docID
FROM vector_tab
ORDER BY VECTOR_DISTANCE( embedding, :query_vector, EUCLIDEAN_SQUARED)
FETCH FIRST 10 ROWS ONLY;
Note:
TheEXACT
keyword is optional. If omitted while connected to an ADB-S
instance, an approximate search using a vector index is attempted if one exists. For
more information, see Perform Approximate Similarity Search Using Vector
Indexes.
Note:
Ensure that you use the distance function that was used to train your embedding model.See Also:
Oracle Database SQL
Language Reference for the full syntax of the
ROW_LIMITING_CLAUSE
Parent topic: Query Data With Similarity and Hybrid Searches