UTL_TO_TEXT

Use the DBMS_VECTOR_CHAIN.UTL_TO_TEXT chainable utility function to convert an input document (for example, PDF, DOC, JSON, XML, or HTML) to plain text.

Purpose

To perform a file-to-text transformation by using the Oracle Text component (CONTEXT) of Oracle Database.

Syntax

DBMS_VECTOR_CHAIN.UTL_TO_TEXT (
    DATA          IN CLOB | BLOB,
    PARAMS        IN JSON default NULL
) return CLOB;

DATA

This function accepts the input data type as CLOB or BLOB. It can read documents from a remote location or from files stored locally in the database tables.

It returns a plain text version of the document as CLOB.

Oracle Text supports around 150 file types. For a complete list of all the supported document formats, see Oracle Text Reference.

PARAMS

Specify the following input parameter in JSON format:

{ 
    "plaintext" : "true or false",
    "charset"   : "UTF8" 
}

Table 12-31 Parameter Details

Parameter Description

plaintext

Plain text output.

The default value for this parameter is true, that is, by default the output format is plain text.

If you do not want to return the document as plain text, then set this parameter to false.

charset

Character set encoding.

Currently, only UTF8 is supported.

Example

select DBMS_VECTOR_CHAIN.UTL_TO_TEXT (
    t.blobdata, 
     json('{
            "plaintext": "true",
            "charset"  : "UTF8" 
           }')
) from tab t;

End-to-end example:

To run an end-to-end example scenario using this function, see Convert File to Text to Chunks to Embeddings Within Oracle Database.