Image Transformer ONNX Format Models
Image transformer is a part of machine learning that helps computers interpret and analyze images and videos. It provides tools to perform tasks like creating image embeddings (using an image transformer), classifying objects, detecting anomalies, and identifying objects in pictures or videos.
- Decoding images from formats like JPEG to a 3D numeric array.
- Resizing images to standard dimensions.
- Normalizing pixel values.
- Reducing noise in the image.
- Cropping parts of the image for focus.
Image transformer models can be converted into the ONNX format and used directly in Oracle Database. Each image transformer requires its own specific pre-processing pipeline and Oracle offers OML4Py pre-processing pipeline for such models.
Pretrained Image Transformer Models in Oracle Database
Oracle Database supports using pretrained image transformer models for generating vectors for semantic similarity search.
You can access image transformer models through machine learning platforms like Hugging Face that provide pretrained models for immediate use.
- Download pretrained models: Download image transformer models into the database.
- Convert image transformer model to ONNX format: Use ONNX pipeline to convert the pretrained image transformer model to ONNX format. Add image pre-processing by implementing Oracle's custom ONNX operation for image decoding and create a model-specific ONNX pre-processing pipeline. See Import Pretrained Models in ONNX Format for Vector Generation Within the Database for more details.
- Import ONNX format image transformer model: Use the
DBMS_VECTOR.LOAD_ONNX_MODEL
procedure orDBMS_DATA_MINING.IMPORT_ONNX_MODEL
to import the ONNX model into your Oracle database. After importing, use theVECTOR_EMBEDDING
operator to generate vector embeddings from JPEG images stored as BLOB in the database.
Note:
Only JPEG images are supported. Multiple ONNX models may have to be loaded for multi-modal model because each modality has a different pre-processing and post-processing pipeline.- ResNet-50: A widely used model for image classification.
- CLIP ViT-Base-Patch32: A multi-modal model for linking text and image content.
- ViT Base-Patch: A vision transformer model designed for image analysis and classification.
Example: Generate Embeddings from Image Transformer Models
The following examples illustrate generating embeddings from images with
image transformer model using DBMS_VECTOR
or
DBMS_DATA_MINING
packages and use the ONNX Runtime for inference
through the SQL prediction operators.
- the data set is available to the user.
- the
DM_DUMP
directory exists and contains the ONNX model file for image transformer models augmented with image pre-processing. Follow the steps in ONNX Pipeline Models: Image Embedding and ONNX Pipeline Models: Multi-modal Embedding to generate the ONNX files for the ResNet-50 and Clip ViT models. See also Import ONNX Models into Oracle Database End-to-End Example.
The following example loads the contents of a file stored in
a directory object (DM_DUMP
) into a BLOB in the database. The
function returns the BLOB
containing the file
content.
create or replace
function loader(p_filename varchar2) return blob is
bf bfile := bfilename('DM_DUMP',p_filename);
b blob;
begin
dbms_lob.createtemporary(b,true);
dbms_lob.fileopen(bf, dbms_lob.file_readonly);
dbms_lob.loadfromfile(b,bf,dbms_lob.getlength(bf));
dbms_lob.fileclose(bf);
return b;
end;
/
The following example creates the image_data
table
assuming image files are under the DM_DUMP
directory. The
image_data
table is used further for generating vector
embeddings.
SQL> CREATE TABLE image_data (
ID NUMBER,
NAME VARCHAR2(20),
IMAGE BLOB
);
Table created.
SQL> insert into image_data values (1,'cat.jpg',loader('cat.jpg'));
1 row created.
SQL> insert into image_data values (2,'cat2.jpg',loader('cat2.jpg'));
1 row created.
SQL> insert into image_data values (3,'chicken.jpg',loader('chicken.jpg'));
1 row created.
SQL> insert into image_data values (4,'horse.jpg',loader('horse.jpg'));
1 row created.
SQL> insert into image_data values (5,'dog.jpg',loader('dog.jpg'));
1 row created.
SQL> insert into image_data values (6,'cat.png',loader('cat.png'));
1 row created.
SQL> commit;
Commit complete.
The following example
demonstrates loading an image tranformer model extended with image pre-processing
pipeline and using it to generate vector embeddings from images stored in a BLOB.
The example assumes that the DM_DUMP
directory exists and contains
the ONNX file for ResNet-50 model augmented with ONNX-based image pre-processing
pipeline.
The example imports a pretrained ONNX-format transformer model
(pp_resnet_50.onnx
) into the database as
ppresnet50
using the
DBMS_VECTOR.LOAD_ONNX_MODEL
procedure. Alternately, you can
load the model using the DBMS_DATA_MINING.IMPORT_ONNX_MODEL
. After
checking the dictionary views, and examining the schema of the
image_data
table, the model runs a query that generates vector
embeddings for each image stored in the image_data
table using the
VECTOR_EMBEDDING
operator. The vector embeddings can be further
used for image classification, similarity search, or feature extraction. The query
returns the first 40 characters of each vector. For unsupported formats such as
cat.png
(a PNG file), the VECTOR_EMBEDDING
operator returns a NULL value.
-- Metadata for an embedding model
SQL> define ppjsonmd = '{"function" : "embedding"}';
SQL> exec DBMS_VECTOR.LOAD_ONNX_MODEL('DM_DUMP', 'pp_resnet_50.onnx', 'ppresnet50');
PL/SQL procedure successfully completed.
SQL> SELECT mining_function, algorithm, model_size FROM user_mining_models WHERE model_name = 'PPRESNET50';
MINING_FUNCTION ALGORITHM MODEL_SIZE
------------------------------ -------------------- ----------
EMBEDDING ONNX 93979933
SQL> SELECT attribute_name, attribute_type, data_type, vector_info FROM user_mining_model_attributes WHERE model_name = 'PPRESNET50' ORDER BY 1;
ATTRIBUTE_NAME ATTRIBUTE_TYPE DATA_TYPE VECTOR_INFO
-------------------- -------------------- ---------- --------------------
DATA UNSTRUCTURED BLOB
ORA$ONNXTARGET VECTOR VECTOR VECTOR(2048,FLOAT32)
SQL> describe image_data
Name Null? Type
----------------------------------------------------------------- -------- --------------------------------------------
ID NUMBER
NAME VARCHAR2(20)
IMAGE BLOB
SQL> SELECT name, substr(vector_embedding(ppresnet50 using image as data), 0, 40) as vec FROM image_data;
NAME VEC
-------------------- ----------------------------------------
cat.jpg [0,3.69947255E-002,1.727576E-002,0,6.437
cat2.jpg [5.25364205E-002,0,0,2.8940714E-003,0,4.
chicken.jpg [2.14146048E-001,7.94866239E-004,2.95593
horse.jpg [1.63398478E-002,0,4.99145657E-001,0,0,1
dog.jpg [0,0,7.96773005E-004,0,0,0,1.00504747E-0
cat.png
6 rows selected.
Alternately,
use the DBMS_DATA_MINING.IMPORT_ONNX_MODEL
procedure to import the
ppresnet50
model into the database and proceed with the rest of
the steps as shown in the example. Here, the loader function loads the content of
the file or ONNX files into a blob.
SQL> exec DBMS_DATA_MINING.IMPORT_ONNX_MODEL('ppresnet50', loader('pp_resnet_50.onnx'), JSON('&ppjsonmd'));
PL/SQL procedure successfully completed.
The following
example uses CLIP ViT Base patch model (ppclip
) to check
pre-configured ONNX-based image embedding pipeline and generates vector embeddings.
The example assumes that the DM_DUMP
directory exists and contains
the ONNX files for each modality of the CLIP ViT Base patch model. The
pp_clip_img.onnx
holds the model augmented with ONNX-based
image pre-processing and post-processing pipelines needed for image modality. The
pp_clip_txt.onnx
holds the model augmented with ONNX-based
pre-processing and post-processing pipelines for text modality. Follows the steps in
ONNX Pipeline Models: Multi-modal Embedding
to get the ONNX files for each of the modality of the CLIP ViT Base patch
model.
SQL> set echo on
SQL> -- Import clip model with image preprocessing (image modality)
SQL> exec DBMS_VECTOR.LOAD_ONNX_MODEL('DM_DUMP', 'pp_clip_img.onnx', 'clipimg');
PL/SQL procedure successfully completed.
SQL> -- Import clip model with text preprocessing (text modality)
SQL> exec DBMS_VECTOR.LOAD_ONNX_MODEL('DM_DUMP', 'pp_clip_txt.onnx', 'cliptxt');
PL/SQL procedure successfully completed.
SQL> -- Show difference between the two modality:
SQL> SELECT model_name, attribute_name, attribute_type, data_type, vector_info FROM user_mining_model_attributes WHERE model_name LIKE 'CLIP%' ORDER BY 1,2;
MODEL_NAME ATTRIBUTE_NAME ATTRIBUTE_TY DATA_TYPE VECTOR_INFO
---------- ---------------- ------------ ---------------- --------------------
CLIPIMG DATA UNSTRUCTURED BLOB
CLIPIMG ORA$ONNXTARGET VECTOR VECTOR VECTOR(512,FLOAT32)
CLIPTXT DATA TEXT VARCHAR2
CLIPTXT ORA$ONNXTARGET VECTOR VECTOR VECTOR(512,FLOAT32)
SQL> -- Create a table with vectors generated from image using clip
SQL> CREATE TABLE image_vectors as select name, vector_embedding(clipimg using image as data) as embedding FROM image_data;
Table created.
SQL> -- Find top-3 similar image from text description
SQL> select name from image_vectors order by vector_distance(vector_embedding(cliptxt using 'Cat picture' as data), embedding) fetch first 2 rows only;
NAME
--------------------
cat.jpg
cat2.jpg
DBMS_DATA_MINING.IMPORT_ONNX_MODEL
procedure
to load the clipimg
and cliptxt
models into the
database.-- Import CLIP model with image preprocessing (image modality)
SQL> exec DBMS_DATA_MINING.IMPORT_ONNX_MODEL('clipimg', loader('pp_clip_img.onnx'), JSON('{"function" : "embedding"}'));
PL/SQL procedure successfully completed.
-- Import CLIP model with text preprocessing (text modality)
SQL> exec DBMS_DATA_MINING.IMPORT_ONNX_MODEL('cliptxt', loader('pp_clip_txt.onnx'), JSON('{"function" : "embedding"}'));
PL/SQL procedure successfully completed.