SPARSE Vectors
The storage format of a vector can be specified as SPARSE
or DENSE
. Sparse vectors are vectors that typically have a large number of
dimensions but with very few non-zero dimension values, while Dense vectors are vectors
where every dimension stores a value, including zero-values.
Sparse vectors can be generated by Sparse Encoding models such as SPLADE or BM25. Generally speaking, sparse models such as SPLADE outperform dense models, such as BERT and All-MiniLM, in keyword awareness search. They are also widely used for Hybrid Vector Search by combining sparse and dense vectors.
Conceptually, a sparse vector can be thought of as a vector where every dimension corresponds to a keyword in a certain vocabulary. For a given document, the sparse vector contains non-zero dimension values representing the number of occurrences for the keywords within that document. For example, BERT has a vocabulary size of 30,522 and several sparse encoders generate vectors of this dimensionality.
Representing a dense vector with 30,522 dimensions with only 100 non-zero
FLOAT32
dimension values would still require 30,522 * 4 =
~120KB of storage. Such a format takes up a lot of space for no reason as
most of the dimension values are 0. This would cause a huge performance deficit
compared to the SPARSE
representation of such vectors.
That is why when using SPARSE
vectors, only the
non-zero dimension values are physically stored.
Here is an example of creating and inserting a SPARSE
vector:
DROP TABLE my_sparse_tab PURGE;
CREATE TABLE my_sparse_tab (v01 VECTOR(5, INT8, SPARSE));
INSERT INTO my_sparse_tab VALUES('[5,[2,4],[10,20]]');
INSERT INTO my_sparse_tab VALUES('[[2,4],[10,20]]');
SELECT * FROM my_sparse_tab;
V01
--------------------
[5,[2,4],[10,20]]
[5,[2,4],[10,20]]
You can see the difference between the SPARSE
textual
form within the INSERT
statement with the one used for a
DENSE
vector. The SPARSE
textual form looks
like:
'[Total Dimension Count, [Dimension Index Array], [Dimension
Value Array]]'
The example uses the number of dimensions in total (5 here but it is optional to specify it in this case as it is defined in the column's declaration), then gives the list of coordinates that have non-zero values, then the list of the corresponding values. In this example, coordinate 2 has the value 10 and coordinate 4 has the value 20. Coordinates 1, 3, and 5 have the value 0.
It is not permitted to use a DENSE
textual form for
SPARSE
vectors and vice versa. However, it is possible to use
vector functions to transform one into the other as illustrated in the following
sample code using the table my_sparse_tab
(created in the previous
snippet):
The following INSERT
statement fails:
INSERT INTO my_sparse_tab VALUES('[0, 10, 0, 20, 0]');
Error starting at line : 1 in command -
INSERT INTO my_sparse_tab VALUES ('[0,10,0,20,0]')
Error at Command Line : 1 Column : 33
Error report -
SQL Error: ORA-51833: Textual input conversion between sparse and dense vector is not
supported.
However, this insertion works:
INSERT INTO my_sparse_tab VALUES (TO_VECTOR('[0,0,10,0,20]', 5, INT8, DENSE));
SELECT * FROM my_sparse_tab;
V01
____________________
[5,[2,4],[10,20]]
[5,[2,4],[10,20]]
[5,[3,5],[10,20]]
You can also transform a SPARSE
vector into a
DENSE
textual form if needed and vice versa:
SELECT FROM_VECTOR(v01 RETURNING CLOB FORMAT DENSE)
FROM my_sparse_tab
WHERE ROWNUM<2;
FROM_VECTOR(V01RETURNINGCLOBFORMATDENSE)
___________________________________________
[0,10,0,20,0]
Note:
TheRETURNING
clause used in the preceding example can also return a VARCHAR2
or
a BLOB
.
Parent topic: Create Tables Using the VECTOR Data Type