Examples of Using ONNX Models

Iris is a flower and this data set has information such as petal length, sepal length, petal width, and sepal width collected from three types of Iris flowers: sentosa, versicolour, and virginica.

ONNX Classification Examples

The following examples showcase various JSON metadata parameters that can be defined for ONNX models.

Example: Specifying JSON Metadata for Classification Models

The following example illustrates JSON metadata parameters with Classification as the function. Assume the model has an output named probabilities for the probability of the prediction. To use the PREDICTION_PROBABILITY scoring function, you must set the field classificationProbOutput to the name of the model output that holds the probability.

BEGIN 
DBMS_VECTOR.LOAD_ONNX_MODEL('classification_model.onnx', 'doc_model', 
JSON('{"function" : "classification",
       "classificationProbOutput": "probabilities"}'));
END;
/

Example: Specifying labels in JSON Metadata for Classification Models

The following example illustrates how you can specify custom labels in the JSON metadata.

BEGIN 
DBMS_VECTOR.LOAD_ONNX_MODEL('classification_model.onnx', 'doc_model',
JSON('{"function" : "classification",
	   "classificationProbOutput": "probabilities",
	   "labels": ["Setosa", "Versicolour", "Virginica"]}'));
END;
/

You can use the PREDICTION and PREDICTION_PROBABILITY functions for inference or scoring:

SELECT
    iris.*,
    PREDICTION(doc_model USING *) as predicted_species_id,
    PREDICTION_PROBABILITY(doc_model, 'setosa' USING *) as setosa_probability
FROM iris;

The query predicts iris species and the probability of setosa species using the iris data set. The data from iris table is used in a SELECT query to predict a species ID and the probability that the species is setosa using a machine learning model named doc_model. The PREDICTION function predicts the species based on the attributes in the table, and the PREDICTION_PROBABILITY function computes the probability that the predicted species is setosa. The result includes all columns from the iris view along with the predicted species ID and the probability of the species being setosa.

Example: Specifying input in JSON Metadata for Classification Models

The following example illustrates how you can specify input attribute names that map to the actual ONNX model input names. This example assumes a model with four inputs named SEPAL_LENGTH, SEPAL_WIDTH, PETAL_LENGTH, and PETAL_WIDTH. You can specify alternative input attribute names using the JSON metadata as shown in this example. Here, each input is assumed to be a tensor with a dimension of 1. The input field must be a JSON object where each field is a model input name (For example, SEPAL_LENGTH), and its value is a JSON array sized according to the tensor's dimension (here, 1) with one attribute name per element in the array.

BEGIN DBMS_VECTOR.LOAD_ONNX_MODEL('classification_model.onnx', 'doc_model', 
JSON('{"function" : "classification", 
       "classificationProbOutput": "probabilities", 
       "input": { "SEPAL_LENGTH": ["SEPAL_LENGTH_CM"], 
                  "SEPAL_WIDTH": ["SEPAL_WIDTH_CM"], 
                  "PETAL_LENGTH": ["PETAL_LENGTH_CM"], 
                  "PETAL_WIDTH": ["PETAL_WIDTH_CM"] } }'));
END;
/

You can also have a different order of the columns as input.

BEGIN DBMS_VECTOR.LOAD_ONNX_MODEL('classification_model.onnx', 'doc_model', 
	JSON('{"function" : "classification", 
		   "classificationProbOutput": "probabilities", 
		   "input": { "SEPAL_WIDTH": ["SEPAL_WIDTH_CM"],
			       "PETAL_LENGTH": ["PETAL_LENGTH_CM"],
			       "PETAL_WIDTH": ["PETAL_WIDTH_CM"],
				"SEPAL_LENGTH": ["SEPAL_LENGTH_CM"] } }'));
END;
/

Example: Specifying a Single input With Four Dimensions

Here is an example where the model has a single input tensor named x with four dimensions. The corresponding JSON metadata for this scenario is:

JSON('{"function" : "classification",
       "classificationProbOutput": "probabilities",
       "input": { "x": ["SEPAL_LENGTH_CM",
                        "SEPAL_WIDTH_CM",
                        "PETAL_LENGTH_CM",
                        "PETAL_WIDTH_CM"]
                     }'));

You can use PREDICTION and PREDICTION_PROBABILITY functions for inference or scoring.

WITH
dummy_iris AS (
    SELECT
    4.5 as petal_length_cm,
    1.5 as petal_width_cm,
    4.3 as sepal_length_cm,
    2.9 as sepal_width_cm
    FROM iris
)
SELECT
    dummy_iris.*,
    PREDICTION(doc_model USING *) as predicted_species_id,
    PREDICTION_PROBABILITY(doc_model 'setosa' USING *) as setosa_probability
FROM dummy_iris;

The query predicts iris species and the probability of setosa species using specified attributes in a temporary data set. The query creates a temporary dummy_iris view with attributes values set. This temporary view is then used in a SELECT query to predict a species ID and the probability that the species is setosa using a machine learning model named doc_model. The PREDICTION function predicts the species based on the attributes provided, and the PREDICTION_PROBABILITY function computes the probability that the predicted species is setosa. The result includes all columns from the dummy_iris view along with the predicted species ID and the probability of the species being setosa.

Example: Specifying defaultOnNull in JSON Metadata for Classification Models

The following examples illustrates how you can specify defaulOnNull provides default values to be used for specific attributes when their values are NULL in the data set. Use the names SEPAL_LENGTH, SEPAL_WIDTH, PETAL_LENGTH, and PETAL_WIDTH as fields in the defaultOnNull object, which are the assumed input attribute names for a ONNX model with four inputs. These names serve as the default input attribute names, so you can use them as fields in the defaultOnNull.

BEGIN DBMS_VECTOR.LOAD_ONNX_MODEL('classification_model.onnx', 'doc_model', 
	JSON('{"function" : "classification", 
		   "classificationProbOutput": "probabilities", 
		   "defaultOnNull": {"SEPAL_LENGTH": "5.1", 
		   "SEPAL_WIDTH": "3.5", 
		   "PETAL_LENGTH": "1.4", 
		   "PETAL_WIDTH": "0.2"}}'));
END;
/

"SEPAL_LENGTH": "5.1": If the sepal length is null, use 5.1 as the default value.
"SEPAL_WIDTH": "3.5": If the sepal width is null, use 3.5 as the default value.
"PETAL_LENGTH": "1.4": If the petal length is null, use 1.4 as the default value.
"PETAL_WIDTH": "0.2": If the petal width is null, use 0.2 as the default value.

Example: Specifying input and defaultOnNull JSON Metadata for Classification Models

Here is a combined example of specifying input and defaultOnNull values. This example uses the values that were illustrated in the earlier examples where input and defaultOnNull values are specified:

JSON('{"function" : "classification",
               "classificationProbOutput": "probabilities",
               "input": { "SEPAL_WIDTH": ["SEPAL_WIDTH_CM"],
                          "PETAL_LENGTH": ["PETAL_LENGTH_CM"],
                          "PETAL_WIDTH": ["PETAL_WIDTH_CM"],
                          "SEPAL_LENGTH": ["SEPAL_LENGTH_CM"] },
                    "defaultOnNull": {"SEPAL_LENGTH_CM": "5.1",
                                      "SEPAL_WIDTH_CM": "3.5"}}')

ONNX Clustering Examples

The following examples showcase various JSON metadata parameters that can be defined for ONNX models.

Example: Specifying JSON Metadata for Clustering Models

The following example illustrates JSON metadata parameters with Clustering as the function. Assume the model has an output named probabilities for the probability of the prediction. To use the CLUSTER_PROBABILITY scoring function, you must set the field clusteringProbOutput to the name of the model output that holds the probability.

BEGIN
DBMS_VECTOR.LOAD_ONNX_MODEL('clustering_model.onnx','doc_model',
    JSON('{"function": "clustering",
           "clusteringProbOutput": "probabilities"
        }
    ')
);
END;
/

You can use CLUSTER_ID and CLUSTER_PROBABILITY functions for inference or scoring.

SELECT
    iris.*,
    CLUSTER_ID(doc_model USING *) as cluster_id,
    CLUSTER_PROBABILITY(doc_model, 1 USING *) as cluster_1_probability
FROM iris;

This query predicts the cluster assignments and the probabilities of belonging to a specific cluster for each record of the iris data set. The query retrieves all columns of each record (iris.*) and applies the clustering model named doc_model to each record of the iris data set and predicts the cluster ID. The USING * clause tells the model to use all available columns in the iris table for this prediction. The CLUSTER_PROBABILITY(doc_model, 1 USING *) as cluster_1_probability part of the query calculates the probability that each record belongs to cluster 1, according to the doc_model from the iris data set. This provides insights into how likely each record is to be part of cluster 1, giving a quantitative measure of membership strength.

Example: Specifying clusteringDistanceOutput in JSON Metadata for Clustering Models

The following example illustrates how you can specify clusteringDistanceOutput and for ONNX Clustering models.

In this model, an output tensor named distances provides distances for the input, which is a single tensor named float_input with a dimension of 4. The JSON metadata input field must map attribute names to entries of the tensor, such as "SEPAL_LENGTH", "SEPAL_WIDTH", "PETAL_LENGTH", "PETAL_WIDTH".

BEGIN 
DBMS_VECTOR.LOAD_ONNX_MODEL('clustering_model.onnx', 'doc_model', 
JSON('{"function" : "clustering", 
"clusteringDistanceOutput": "distances", 
"normalizeProb": "softmax", 
"input": { "float_input": ["SEPAL_LENGTH", "SEPAL_WIDTH", "PETAL_LENGTH", "PETAL_WIDTH"] } 
      }')
);
END;
/

You can use CLUSTER_DISTANCE function for inference or scoring. These SQL queries utilize clustering models to predict cluster distances from the IRIS data set.

SELECT CLUSTER_DISTANCE(doc_model USING *) AS predicted_target_value,
CLUSTER_DISTANCE (doc_model,1 USING *) AS dist1,
CLUSTER_DISTANCE (doc_model,2 USING *) AS dist2,
CLUSTER_DISTANCE (doc_model,3 USING *) AS dist3
FROM IRIS
ORDER BY ID
FETCH NEXT 10 ROWS ONLY;

Here, the query focuses on understanding the physical distance of data points from cluster centroids, which is particularly useful for identifying outliers or for performing detailed cluster analysis. The query calculates the distance of each record in the IRIS data set from the centroids of different clusters using the doc_model. The USING * syntax indicates that the model must use all available columns of the IRIS data set for making the prediction. CLUSTER_DISTANCE(doc_model, n USING *) computes the distance from cluster n (n being 1, 2, and 3 in this query). Each distance is selected as a separate column (dist1, dist2, dist3).

The output is limited to the first 10 rows of the result set ordered by the ID column of the IRIS table.

Example: Specifying clusteringProbOutput and normalizeProb in JSON Metadata for Clustering Models

The following example illustrates how you can specify clusteringProbOutput and normalizeProb for ONNX Clustering models.

BEGIN 
DBMS_VECTOR.LOAD_ONNX_MODEL('clustering_model.onnx', 'doc_model',
               JSON('{"function" :
			   "clustering",
			   "clusteringProbOutput": "probabilities",
                        "normalizeProb" : "softmax",
			   "input": { "float_input": ["SEPAL_LENGTH", "SEPAL_WIDTH", "PETAL_LENGTH", "PETAL_WIDTH"] } }')
      );
END;
/

You can use CLUSTER_PROBABILITY and CLUSTER_SET functions for inference or scoring:

SELECT CLUSTER_ID (doc_model USING *) AS predicted_target_value,
                    CLUSTER_PROBABILITY (doc_model,1 USING *) AS prob1,
                    CLUSTER_PROBABILITY (doc_model,2 USING *) AS prob2,
                    CLUSTER_PROBABILITY (doc_model,3 USING *) AS prob3
                    FROM IRIS
                    ORDER BY ID
                    FETCH NEXT 10 ROWS ONLY;

In this case, a clustering model is used to predict the cluster IDs and associated probabilities for records from the IRIS data set. Because the JSON metadata specifies softmax for the normalizeProb field, the model applies softmax normalization to the probabilities before returning them as the result of the CLUSTER_PROBABILITY scoring operator.

The SQL query selects CLUSTER_ID column from the IRIS table and adds a new column, predicted_target_value, which contains predictions made by the doc_model. The USING * syntax means that all columns of the current row are used as input features for the doc_model model to predict the value as predicted_target_value. The result of this prediction is then included as a new column in the output of the query.

CLUSTER_PROBABILITY(model, n USING *): Computes the probability that the record belongs to cluster n (n being 1, 2, and 3 in this query). This is done for three different clusters, and each probability is selected as a separate column (prob1, prob2, prob3).

The output is limited to the first 10 rows of the result set ordered by the ID column of the IRIS table.


SELECT S.CLUSTER_ID, S.PROBABILITY
 FROM (SELECT CLUSTER_SET(doc_model USING *) pset
          FROM IRIS ORDER BY ID) T,
       TABLE(T.pset) S
FETCH NEXT 10 ROWS ONLY;

The CLUSTER_SET query generates a set of cluster data using the doc_model. The resultant column pset represents all possible cluster assignments for each record, which includes cluster IDs and their respective probabilities ordered by the ID column. The SELECT S.CLUSTER_ID, S.PROBABILITY part of the query selects the cluster ID and probability from the resultant column set. The output is limited to the first 10 rows of the result set.

ONNX Regression Examples

The following examples showcase various JSON metadata parameters that can be defined for ONNX Regression models. All examples assume an ONNX model that has one output named regressionOutput and four input tensors of dimension 1 whose name match exactly the name of the IRIS table columns, namely, SEPAL_LENGTH, SEPAL_WIDTH, PETAL_LENGTH, PETAL_WIDTH.

Example: Specifying JSON Metadata for Regression Models

The following is a simple example illustrating JSON metadata parameters with Regression as the function. Assume the ONNX model features one output named regressionOutput and four input tensors of dimension 1, whose names match exactly after the IRIS table columns ("SEPAL_LENGTH", "SEPAL_WIDTH", "PETAL_LENGTH", "PETAL_WIDTH"). The JSON metadata can be as simple as the following:

BEGIN DBMS_VECTOR.LOAD_ONNX_MODEL(
    'regression_model.onnx',
    'doc_model',
    JSON('{"function": "regression"}
    ')
);
END;
/

You can use the PREDICTION function for inference or scoring:

SELECT
    iris.*,
    PREDICTION(doc_model USING *) as predicted_petal_width_cm
FROM iris;

In this case, the SQL query selects all columns from the iris table and adds a new column, predicted_petal_width_cm, which contains predictions made by the doc_model. The USING * syntax means that all columns of the current row are used as input features for the doc_model model to predict the value of PETAL_WIDTH as predicted_petal_width_cm. The result of this prediction is then included as a new column in the output of the query.

Example: Specifying input and defaultOnNull in JSON Metadata for Regression Models

The following example illustrates how you can specify input attribute names that map to the actual ONNX model input names. The defaulOnNull providing default values to be used for specific attributes when their values are NULL in the data set.

BEGIN DBMS_VECTOR.LOAD_ONNX_MODEL('regression_model.onnx','doc_model',
    JSON('{"function": "regression",
            "input": {
                "SEPAL_LENGTH": ["dummy_sepal_length_cm"],
                "SEPAL_WIDTH": ["dummy_sepal_width_cm"]
                      },
            "defaultOnNull": {
                "dummy_sepal_length_cm": "5.1",
                "dummy_sepal_width_cm": "3.5",
                 }
           }
        ')
);
END;
/

You can use the PREDICTION function for inference or scoring.

WITH
dummy_iris AS (
    SELECT
    (CASE WHEN petal_length > 5 THEN 4.9 ELSE NULL END)
        as dummy_sepal_length_cm,
    (CASE WHEN petal_length < 4 THEN 2.5 ELSE NULL END)
        as dummy_sepal_width_cm,
    petal_length
    petal_width
    FROM iris
)
SELECT
    dummy_iris.*,
    PREDICTION(doc_model USING *) as predicted_petal_width_cm
FROM dummy_iris;

In this case, a temporary dummy_iris table is created with three columns: dummy_sepal_length_cm, dummy_sepal_width_cm, and petal_length. The values of the dummy_sepal_length_cm and dummy_sepal_width_cm are based on petal_length values of the iris table. If petal_length is greater than 5, dummy_sepal_length_cm is set to 4.9, otherwise it is NULL. If petal_length is less than 4, dummy_sepal_width_cm is set to 2.5, otherwise it remains NULL.

Then the SELECT query retrieves all columns from the dummy_iris table and uses the doc_model to predict petal_width, adding this prediction as a new column named predicted_petal_width_cm. The model uses the derived dummy columns, petal_length and petal_width for its predictions.