Custom Distance Function

JavaScript user-defined functions can be used to define a custom vector distance. This provides greater flexibility in the types of distance equations that can be employed, extending vector search functionality to a broader range of use cases.

A custom distance function is created by a user-defined JavaScript function defined in a Multilingual Engine (MLE) inline call specification. The signature of the function must match the signature of existing built-in distance functions. As in, it must accept exactly two arguments of type VECTOR and return a BINARY_DOUBLE. The function signature must also include the DETERMINISTIC keyword. The following function definition provides an example of a custom distance function, in this case implementing the Euclidean Squared distance:

CREATE OR REPLACE FUNCTION euclidean_sq_vector_distance("a" VECTOR, "b" VECTOR)
RETURN BINARY_DOUBLE
DETERMINISTIC PARALLEL_ENABLE
AS MLE LANGUAGE JAVASCRIPT PURE
{{
	let len = a.length;
	let sum = 0;
	for(let i = 0; i < len; i++) {
		const tmp = a[i] - b[i];
		sum += tmp * tmp;
	}
	return sum;
}};
/

Custom distance functions can be used with HNSW vector indexes. If the degree of parallelism in the vector index is greater than 1, the custom distance function must include the PARALLEL_ENABLE clause. Upon index creation, a custom distance can be specified by name in the DISTANCE clause. In queries, the custom distance can be used in the ORDER BY clause and in the SELECT list. The distance function tied to a vector index can be viewed by querying the VECSYS.VECTOR$INDEX view.

When used in the creation of an HNSW index, the PURE keyword must be specified in the MLE call specification. The PURE clause indicates that the JavaScript program should be run in a restricted execution context, which guarantees that the code will not modify stateful objects, such as database tables or PL/SQL packages, regardless of database privileges currently in effect. A user-defined function used to create a custom distance metric only handles computations on function inputs, which do not require access to database state. Restricted contexts provide an extra layer of security by prohibiting unwanted database modifications. For more information on restricted execution contexts and the PURE keyword, see Oracle Database JavaScript Developer's Guide.

In order to use a vector index dependent on a custom distance function, you must have EXECUTE privileges on the function specified during index creation. You must also have EXECUTE privileges on JAVASCRIPT. For vector indexes, only definer's rights are supported.

If the distance function is modified, the associated vector index will be in an UNUSABLE state.

Note:

The use of custom distance functions with IVF indexes is not currently supported.

Use the previously created custom distance function, euclidean_sq_custom_distance, to first create a vector index:

CREATE TABLE custom_dist_tab( id NUMBER, data_vector VECTOR(2, FLOAT32));

INSERT INTO custom_dist_tab VALUES (1, vector('[1.1,2.2]', 2, float32));
INSERT INTO custom_dist_tab VALUES (2, vector('[2.2,3.3]', 2, float32));
INSERT INTO custom_dist_tab VALUES (3, vector('[3.3,4.4]', 2, float32));
INSERT INTO custom_dist_tab VALUES (4, vector('[4.4,5.5]', 2, float32));
INSERT INTO custom_dist_tab VALUES (5, vector('[5.5,6.6]', 2, float32));

CREATE VECTOR INDEX cust_dist_idx_hnsw ON custom_dist_tab (data_vector)
ORGANIZATION INMEMORY
NEIGHBOR GRAPH WITH TARGET ACCURACY 95
DISTANCE CUSTOM EUCLIDEAN_SQ_VECTOR_DISTANCE
PARALLEL 3;

The custom distance function can be referenced in similarity search queries in the ORDER BY clause or in the SELECT list:

SELECT data_vector 
FROM custom_dist_tab
ORDER BY euclidean_sq_vector_distance(data_vector, VECTOR('[1, 2]'))
FETCH FIRST 5 ROWS ONLY;

SELECT
  data_vector,
  euclidean_sq_vector_distance(data_vector, VECTOR('[1, 2]')) edist
FROM custom_dist_tab
ORDER BY edist
FETCH FIRST 5 ROWS ONLY;

See Also: