39.1 About ONNX
ONNX is an open-source format designed for machine learning models. It ensures cross-platform compatibility. This format also supports major languages and frameworks, facilitating efficient model exchange.
The ONNX format allows for model serialization. Models are represented as graphs of common machine learning operations. These graphs are saved in a portable format called protocol buffers. It simplifies the exchange of models across various platforms. These platforms include cloud, web, edge, and mobile experiences on Microsoft Windows, Linux, Mac, iOS, and Android. ONNX models also offer flexibility to export and import model in many languages such as Python, C++, C#, and Java to name a few. The ONNX format is useful for compute-heavy tasks such as training machine learning models and data processing that often uses trained models. Many leading machine learning development frameworks such as TensorFlow, Pytorch, and Scikit-learn, offer the capability to convert models into the ONNX format.
Once you represent the models in the ONNX format, you can run them with the ONNX Runtime. The architecture of the ONNX Runtime is adaptable, enabling providers to modify or enhance how some operations are implemented to make better use of particular hardware, such as, Graphical Processing Units (GPUs), Single Instruction Multiple Data (SIMD) instruction sets or specialized libraries. To learn more on ONNX Runtime, see https://onnxruntime.ai/docs/.
The ONNX Runtime integration with Oracle Database allows for the import of ONNX-formatted models, including embedding models. To support embedding models, Oracle Machine Learning has introduced a machine learning technique called embedding. If you do not have a pretrained model in ONNX format, Oracle offers a Python utility package that automates the conversion for the user. It downloads a pretrained model, converts the model to ONNX format augmented with pre-processsing and post-processing operations and imports the ONNX format model to Oracle Database. For more information on the Python utility tool, see Convert Pretrained Models to ONNX Format.
Oracle supports ONNX Runtime version 1.15.1.
39.1.1 Supported Machine Learning Functions for ONNX Runtime
Describes the supported machine learning functions to import pretrained models and perform scoring.
The following are the supported machine learning functions:
- Classification
- Clustering
- Embedding
- Regression
39.1.2 Supported Attribute Data Types
Discover the supported ONNX input data types mapped to SQL data types.
Data Type | SQL Type | Supported ONNX Data Type |
---|---|---|
Numerical |
|
float , int8 ,
int16 , int32 ,
int64 , uint8 ,
uint16 , uint32 ,
uint64 |
Categorical |
|
For |
Text |
|
string |
Vectors |
|
|
The following data types are not supported:
-
complex64
,complex128
-
float16
,bfloat16
-
fp8
-
int4
,uint4
39.1.3 Supported Target Data Types
Discover the supported ONNX target data types mapped to SQL data types.
Depending on the machine learning function, different scoring functions are used. Different scoring function for same machine learning function can produce different data types. A few points to note:
-
Classification models have different rules to determine the type of
PREDICTION
function to be used. If you are usingPREDICTION_PROBABILITY
, thenBINARY_DOUBLE
is returned. See labels in JSON Metadata Parameters for ONNX Models. -
For an embedding model, the
VECTOR_EMBEDDING
function returns aVECTOR
type. -
For a regression model,
VARCHAR
is not a valid target type andBINARY_DOUBLE
is returned. -
For a clustering model, if you are using
CLUSTERING_PROBABILITY
andCLUSTER_DISTANCE
, thenBINARY_DOUBLE
is returned.
To learn more, see JSON Metadata Parameters for ONNX Models
Machine Learning Function | SQL Function | SQL Type | Supported ONNX Target Output |
---|---|---|---|
Regression |
|
BINARY_DOUBLE |
regressionOutput |
Classification |
|
|
|
Classification |
|
NUMBER |
|
Classification |
|
BINARY_DOUBLE |
|
Classification |
|
set of ( NUMBER , BINARY_DOUBLE )
|
NA |
Clustering |
|
BINARY_DOUBLE |
|
Clustering |
|
BINARY_DOUBLE |
|
Clustering |
|
set of ( NUMBER , BINARY_DOUBLE
) |
NA |
Embedding |
|
VECTOR( float32, n) |
embeddingOutput |
39.1.4 Custom ONNX Runtime Operations
If you are looking to customize a pretrained embedding model by augmenting with pre-processing and post-processing operations, Oracle supports tokenization of an embedding model as a pre-processing operation and pooling and normalization as post-processing custom ONNX Runtime operations for version 1.15.1.
Oracle offers a Python utility that provides a mechanism to augment a pretrained model with tokenization, pooling and normalization. The Python utility can augment the model with pre-processing and post-processing operations and convert a pretrained model to an ONNX format. Models using any other custom operations will fail on import. For details on how to use the Python utility, see Convert Pretrained Models to ONNX Format.
39.1.5 Use PL/SQL Packages to Import Models
Use the DBMS_DATA_MINING.IMPORT_ONNX_MODEL
procedure or
the DBMS_VECTOR.LOAD_ONNX_MODEL
procedure to import ONNX format models. You
can then use the imported ONNX format models through a scoring function run by the
in-database ONNX Runtime.
- To import a pretrained ONNX format model, use IMPORT_ONNX_MODEL Procedure or LOAD_ONNX_MODEL Procedure.
- To drop an ONNX model, use DROP_ONNX_MODEL. See also DROP_MODEL procedure.
- A complete step-by-step example that illustrates these procedures is in Import ONNX Models and Generate Embeddings.
Note:
In-database embedding models must include tokenization and postprocessing. Providing only the core ONNX model is insufficient, as users would need to handle tokenization externally, pass tensors into the SQL operator, and convert output tensors into vectors.
DBMS_DATA_MINING.RENAME_MODEL
procedure is also
supported.
Most of the existing Oracle Machine Learning for SQL APIs are available to the ONNX models. As partitioning is not applicable for external pretrained models, ONNX models do not support the following procedures:
ADD_PARTITION
DROP_PARTITION
ADD_COST_MATRIX
REMOVE_COST_MATRIX
Related Topics
39.1.6 Supported SQL Scoring Functions
Supported scoring functions for in-database scoring of machine learning models imported in the ONNX format are listed.
Machine Learning Technique | Operator | Supported | Return Type |
---|---|---|---|
Embedding | VECTOR_EMBEDDING |
always | VECTOR(<dimensions ,
FLOAT32>) The number of dimensions of
the output vector of a |
Regression | PREDICTION
|
always | Data type of the target. For regression, the data
type is converted to BINARY_DOUBLE SQL
type.
|
Classification | PREDICTION
|
always | Data type of the target. |
Classification | PREDICTION_PROBABILITY |
always | BINARY_DOUBLE |
Classification | PREDICTION_SET |
always | Set of ( t, NUMBER , BINARY_DOUBLE
) where t is the data type of the
target.
|
Clustering | CLUSTER_ID |
only if |
NUMBER
|
Clustering | CLUSTER_PROBABILITY |
only if clusteringProbOutput is
specified
|
BINARY_DOUBLE |
Clustering | CLUSTER_SET |
only if clusteringProbOutput is
specified
|
Set of ( NUMBER, BINARY_DOUBLE
) |
Clustering | CLUSTER_DISTANCE
|
only if clusteringDistanceOutput is
specified
|
BINARY_DOUBLE |
Note:
You can define the outputs explicitly in the metadata or implicitly.
-
The metadata must explicitly specify how to find the result in the model output for some SQL scoring functions. For example,
CLUSTER_PROBABILITY
is supported only ifclusteringProbOutput
is specified in the metadata. -
The system automatically assumes the output for a model with only one output if you don't specify it in the metadata.
-
If a scoring function does not comply according to the description provided, you will receive an ORA-40290 error when performing the scoring operation on your data. Additionally, any unsupported scoring functions will raise the ORA-40290 error.
To learn more about classification data types that are returned, see
labels
and classificationLabelOutput
in JSON Metadata Parameters for ONNX
Models.
Cost Matrix Clause
Specify a cost matrix directly within the PREDICTION
and PREDICTION_SET
scoring functions. To learn more about Cost
Matrix, see Oracle Machine Learning
for SQL Concepts.