Image Transformer ONNX Format Models

Image transformer is a part of machine learning that helps computers interpret and analyze images and videos. It provides tools to perform tasks like creating image embeddings (using an image transformer), classifying objects, detecting anomalies, and identifying objects in pictures or videos.

Image transformers don’t directly use images as input. They need pre-processing to convert images into a form the model can understand. Common pre-processing steps include:

Decoding images from formats like JPEG to a 3D numeric array.
Resizing images to standard dimensions.
Normalizing pixel values.
Reducing noise in the image.
Cropping parts of the image for focus.

Image transformer models can be converted into the ONNX format and used directly in Oracle Database. Each image transformer requires its own specific pre-processing pipeline and Oracle offers OML4Py pre-processing pipeline for such models.

Pretrained Image Transformer Models in Oracle Database

Oracle Database supports using pretrained image transformer models for generating vectors for semantic similarity search.

You can access image transformer models through machine learning platforms like Hugging Face that provide pretrained models for immediate use.

To use pretrained image transformer models in Oracle Database, here are the high-level steps:

Download pretrained models: Download image transformer models into the database.
Convert image transformer model to ONNX format: Use ONNX pipeline to convert the pretrained image transformer model to ONNX format. Add image pre-processing by implementing Oracle's custom ONNX operation for image decoding and create a model-specific ONNX pre-processing pipeline. See Import Pretrained Models in ONNX Format for Vector Generation Within the Database for more details.
Import ONNX format image transformer model: Use the DBMS_VECTOR.LOAD_ONNX_MODEL procedure or DBMS_DATA_MINING.IMPORT_ONNX_MODEL to import the ONNX model into your Oracle database. After importing, use the VECTOR_EMBEDDING operator to generate vector embeddings from JPEG images stored as BLOB in the database.

Note:

Only JPEG images are supported. Multiple ONNX models may have to be loaded for multi-modal model because each modality has a different pre-processing and post-processing pipeline.

The Oracle database supports popular pretrained models such as:

ResNet-50: A widely used model for image classification.
CLIP ViT-Base-Patch32: A multi-modal model for linking text and image content.
ViT Base-Patch: A vision transformer model designed for image analysis and classification.

Example: Generate Embeddings from Image Transformer Models

The following examples illustrate generating embeddings from images with image transformer model using DBMS_VECTOR or DBMS_DATA_MINING packages and use the ONNX Runtime for inference through the SQL prediction operators.

These examples assume that:

the data set is available to the user.
the DM_DUMP directory exists and contains the ONNX model file for image transformer models augmented with image pre-processing. Follow the steps in ONNX Pipeline Models: Image Embedding and ONNX Pipeline Models: Multi-modal Embedding to generate the ONNX files for the ResNet-50 and Clip ViT models. See also Import ONNX Models into Oracle Database End-to-End Example.

Load File Contents into a BLOB

The following example loads the contents of a file stored in a directory object (DM_DUMP) into a BLOB in the database. The function returns the BLOB containing the file content.

create or replace
    function loader(p_filename varchar2) return blob is
      bf bfile := bfilename('DM_DUMP',p_filename);
      b blob;
    begin
      dbms_lob.createtemporary(b,true);
      dbms_lob.fileopen(bf, dbms_lob.file_readonly);
      dbms_lob.loadfromfile(b,bf,dbms_lob.getlength(bf));
      dbms_lob.fileclose(bf);
     return b;
   end;
   /

Create image_data Table

The following example creates the image_data table assuming image files are under the DM_DUMP directory. The image_data table is used further for generating vector embeddings.

SQL> CREATE TABLE image_data (
      ID NUMBER,
      NAME VARCHAR2(20),
      IMAGE BLOB
    );

Table created.


SQL> insert into image_data values (1,'cat.jpg',loader('cat.jpg'));

1 row created.

SQL> insert into image_data values (2,'cat2.jpg',loader('cat2.jpg'));

1 row created.

SQL> insert into image_data values (3,'chicken.jpg',loader('chicken.jpg'));

1 row created.

SQL> insert into image_data values (4,'horse.jpg',loader('horse.jpg'));

1 row created.

SQL> insert into image_data values (5,'dog.jpg',loader('dog.jpg'));

1 row created.

SQL> insert into image_data values (6,'cat.png',loader('cat.png'));

1 row created.

SQL> commit;

Commit complete.

Load a ResNet-50 Computer Image Transformer and Generate Vector Embeddings

The following example demonstrates loading an image tranformer model extended with image pre-processing pipeline and using it to generate vector embeddings from images stored in a BLOB. The example assumes that the DM_DUMP directory exists and contains the ONNX file for ResNet-50 model augmented with ONNX-based image pre-processing pipeline.

The example imports a pretrained ONNX-format transformer model (pp_resnet_50.onnx) into the database as ppresnet50 using the DBMS_VECTOR.LOAD_ONNX_MODEL procedure. Alternately, you can load the model using the DBMS_DATA_MINING.IMPORT_ONNX_MODEL. After checking the dictionary views, and examining the schema of the image_data table, the model runs a query that generates vector embeddings for each image stored in the image_data table using the VECTOR_EMBEDDING operator. The vector embeddings can be further used for image classification, similarity search, or feature extraction. The query returns the first 40 characters of each vector. For unsupported formats such as cat.png (a PNG file), the VECTOR_EMBEDDING operator returns a NULL value.

-- Metadata for an embedding model
SQL> define ppjsonmd = '{"function" : "embedding"}';

SQL> exec DBMS_VECTOR.LOAD_ONNX_MODEL('DM_DUMP', 'pp_resnet_50.onnx', 'ppresnet50');

PL/SQL procedure successfully completed.


SQL> SELECT mining_function, algorithm, model_size FROM user_mining_models WHERE model_name = 'PPRESNET50';

MINING_FUNCTION                ALGORITHM            MODEL_SIZE
------------------------------ -------------------- ----------
EMBEDDING                      ONNX                 93979933


SQL> SELECT attribute_name, attribute_type, data_type, vector_info FROM user_mining_model_attributes WHERE model_name = 'PPRESNET50' ORDER BY 1;

ATTRIBUTE_NAME              ATTRIBUTE_TYPE       DATA_TYPE  VECTOR_INFO
--------------------    --------------------  ----------    --------------------
DATA                         UNSTRUCTURED         BLOB
ORA$ONNXTARGET               VECTOR               VECTOR     VECTOR(2048,FLOAT32)

 
SQL> describe image_data
 Name                                                                                                                    Null?    Type
 ----------------------------------------------------------------- -------- --------------------------------------------
 ID                                                                                                                                              NUMBER
 NAME                                                                                                                                      VARCHAR2(20)
 IMAGE                                                                                                                                     BLOB

 
SQL> SELECT name, substr(vector_embedding(ppresnet50 using image as data), 0, 40) as vec FROM image_data;

NAME                       VEC
-------------------- ----------------------------------------
cat.jpg              [0,3.69947255E-002,1.727576E-002,0,6.437
cat2.jpg             [5.25364205E-002,0,0,2.8940714E-003,0,4.
chicken.jpg          [2.14146048E-001,7.94866239E-004,2.95593
horse.jpg            [1.63398478E-002,0,4.99145657E-001,0,0,1
dog.jpg              [0,0,7.96773005E-004,0,0,0,1.00504747E-0
cat.png

6 rows selected.

Alternately, use the DBMS_DATA_MINING.IMPORT_ONNX_MODEL procedure to import the ppresnet50 model into the database and proceed with the rest of the steps as shown in the example. Here, the loader function loads the content of the file or ONNX files into a blob.

SQL> exec DBMS_DATA_MINING.IMPORT_ONNX_MODEL('ppresnet50', loader('pp_resnet_50.onnx'), JSON('&ppjsonmd'));

PL/SQL procedure successfully completed.

Load a CLIP ViT Model to Generate Vector Embeddings from Images (Image Modality) and Search Images by Generating Embedding from Text Description (Text Modality)

The following example uses CLIP ViT Base patch model (ppclip) to check pre-configured ONNX-based image embedding pipeline and generates vector embeddings. The example assumes that the DM_DUMP directory exists and contains the ONNX files for each modality of the CLIP ViT Base patch model. The pp_clip_img.onnx holds the model augmented with ONNX-based image pre-processing and post-processing pipelines needed for image modality. The pp_clip_txt.onnx holds the model augmented with ONNX-based pre-processing and post-processing pipelines for text modality. Follows the steps in ONNX Pipeline Models: Multi-modal Embedding to get the ONNX files for each of the modality of the CLIP ViT Base patch model.

SQL> set echo on
SQL> -- Import clip model with image preprocessing (image modality)
SQL> exec DBMS_VECTOR.LOAD_ONNX_MODEL('DM_DUMP', 'pp_clip_img.onnx', 'clipimg');

PL/SQL procedure successfully completed.

SQL> -- Import clip model with text preprocessing (text modality)
SQL> exec DBMS_VECTOR.LOAD_ONNX_MODEL('DM_DUMP', 'pp_clip_txt.onnx', 'cliptxt');

PL/SQL procedure successfully completed.

SQL> -- Show difference between the two modality:
SQL> SELECT model_name, attribute_name, attribute_type, data_type, vector_info FROM user_mining_model_attributes WHERE model_name LIKE 'CLIP%' ORDER BY 1,2;


MODEL_NAME     ATTRIBUTE_NAME 	 ATTRIBUTE_TY 	   DATA_TYPE 	    VECTOR_INFO
----------     ---------------- ------------       ---------------- --------------------
CLIPIMG 	DATA 		    UNSTRUCTURED           BLOB
CLIPIMG 	ORA$ONNXTARGET 	    VECTOR 	           VECTOR 	    VECTOR(512,FLOAT32)
CLIPTXT 	DATA 		    TEXT 		   VARCHAR2
CLIPTXT 	ORA$ONNXTARGET 	    VECTOR 	           VECTOR 	    VECTOR(512,FLOAT32)


SQL> -- Create a table with vectors generated from image using clip
SQL> CREATE TABLE image_vectors as select name, vector_embedding(clipimg using image as data) as embedding FROM image_data;

Table created.

SQL> -- Find top-3 similar image from text description
SQL> select name from image_vectors order by vector_distance(vector_embedding(cliptxt using 'Cat picture' as data), embedding) fetch first 2 rows only;

NAME
--------------------
cat.jpg
cat2.jpg

Alternately, use the DBMS_DATA_MINING.IMPORT_ONNX_MODEL procedure to load the clipimg and cliptxt models into the database.

-- Import CLIP model with image preprocessing (image modality)
SQL> exec DBMS_DATA_MINING.IMPORT_ONNX_MODEL('clipimg', loader('pp_clip_img.onnx'), JSON('{"function" : "embedding"}'));

PL/SQL procedure successfully completed.

-- Import CLIP model with text preprocessing (text modality)
SQL> exec DBMS_DATA_MINING.IMPORT_ONNX_MODEL('cliptxt', loader('pp_clip_txt.onnx'), JSON('{"function" : "embedding"}'));

PL/SQL procedure successfully completed.