SUM

The SUM function takes a vector expression as input and returns the sum as a vector with format FLOAT64.

Purpose

SUM is mainly used, in Natural Language Processing (NLP), to compute a representation of a sentence or a document. It is common to sum the word embeddings of all words. The resulting vector represents the entire text, allowing models to work with sentences instead of just words.

The following diagram illustrates the vector representation, in a 2D space, of word embeddings. If you take S1, "The astronomer observed the stars", and S2, "The stargazer watched the celestial bodies", you can observe that the corresponding vector sum of each constituting word of each sentence (S1 and S2 in the diagram) are very close, for example, in terms of cosine similarity. This means that the two sentences are very similar in meaning.



The result of SUM with a vector expression is the equivalent of consecutively performing vector addition operations on all non-NULL inputs. The returned vector has the same number of dimensions as the input and has the format FLOAT64. When the expression has a flexible number of dimensions, all inputs must have the same number of dimensions within each aggregate group.

The SUM function with vector expressions as input can be used as a single set aggregate or in the GROUP BY clause. Using ROLLUP is also supported. The SUM function accepts vector expressions as input for aggregate operations but cannot currently be applied to analytic operations.

NULL vectors are ignored and are not counted when calculating the sum vector. If all inputs within an aggregate group are NULL, the result is NULL for that group. If the result overflows the FLOAT64 maximum value, an error is raised, regardless of the format of the input vector type.

With vector inputs, using DISTINCT, CUBE, and GROUPING SETS is not supported. Also, BINARY and SPARSE vectors cannot be supplied as input.

For the full definition and implementation of the SUM function, see Oracle Database SQL Language Reference.

CREATE TABLE sum_t (v VECTOR, k1 NUMBER, k2 VARCHAR2(100));
INSERT INTO sum_t VALUES ('[2, 4, 6]', 2, 'even');
INSERT INTO sum_t VALUES ('[8, 10, 12]', 2, 'even');
INSERT INTO sum_t VALUES ('[1, 3, 5]', 3, 'odd');
INSERT INTO sum_t VALUES ('[7, 9, 11]', 3, 'odd');


SELECT SUM(v) v_sum FROM sum_t;

V_SUM
---------------------------------------------------
[1.8E+001,2.6E+001,3.4E+001]


SELECT SUM(v) v_sum, k1 FROM sum_t GROUP BY k1;

V_SUM                                    K1
---------------------------------------  ----------
[1.0E+001,1.4E+001,1.8E+001]             2
[8.0E+000,1.2E+001,1.6E+001]             3


SELECT SUM(v) v_sum FROM sum_t GROUP BY ROLLUP(k1, k2);

V_SUM
---------------------------------------------------
[1.0E+001,1.4E+001,1.8E+001]
[1.0E+001,1.4E+001,1.8E+001]
[8.0E+000,1.2E+001,1.6E+001]
[8.0E+000,1.2E+001,1.6E+001]
[1.8E+001,2.6E+001,3.4E+001]
CREATE TABLE sum_diff_dim_t (v VECTOR, k1 NUMBER, k2 VARCHAR2(100));
INSERT INTO sum_diff_dim_t VALUES ('[2, 4, 6]', 2, 'even');
INSERT INTO sum_diff_dim_t VALUES ('[8, 10, 12]', 2, 'even');
INSERT INTO sum_diff_dim_t VALUES ('[1, 3, 5, 7]', 3, 'odd');
INSERT INTO sum_diff_dim_t VALUES ('[9, 11, 13, 15]', 3, 'odd');


SELECT SUM(v) v_sum, k2 FROM sum_diff_dim_t GROUP BY k2;

V_SUM                                    K2
---------------------------------------  ----------
[1.0E+001,1.4E+001,1.8E+001]             even
[1.0E+001,1.4E+001,1.8E+001,2.2E+001]    odd


SELECT SUM(v) v_sum FROM sum_diff_dim_t;

ERROR:
ORA-51808: SUM(vector) requires all vectors to have the same dimension count.
Encountered (3, 4).