3.6 Improved Document Services Performance with a Forward Index

When it searches for a word in a document, Oracle Text uses an inverted index and then displays the results by calculating the snippet from that document. For calculating the snippet, each document returned as part of the search result is reindexed. The search operation slows down considerably when a document’s size is very large.

The forward index overcomes the performance problem of very large documents. It uses a $O mapping table that refers to the token offsets in the $I inverted index table. Each token offset is translated into the character offset in the original document, and the text surrounding the character offset is then used to generate the text snippet.

Because the forward index does not use in-memory indexing of the documents while calculating the snippet, it provides considerable improved performance over the inverted index while searching for a word in very large documents.

The forward index improves the performance of the following procedures in the Oracle Text CTX_DOC package:

  • CTX_DOC.SNIPPET

  • CTX_DOC.HIGHLIGHT

  • CTX_DOC.MARKUP

See Also:

Oracle Text Reference for information about the forward_index parameter clause of the BASIC_STORAGE indexing type

3.6.1 Enabling Forward Index

The following example enables the forward index feature by setting the forward_index attribute value of the BASIC_STORAGE storage type to TRUE:

exec ctx_ddl.create_preference('mystore', 'BASIC_STORAGE');
exec ctx_ddl.set_attribute('mystore','forward_index','TRUE');

3.6.2 Forward Index with Snippets

In some cases, when you use the forward_index option, generated snippets may be slightly different from the snippets that are generated when you do not use the forward_index option. The differences are generally minimal, do not affect snippet quality, and are typically "few extra white spaces" and "newline."

3.6.3 Forward Index with Save Copy

Using Forward Index with Save Copy

To use the forward index effectively, you should store copies of the documents in the $D table, either in plain-text format or filtered format, depending upon the CTX_DOC package procedure that you use. For example, store the document in plain-text when you use the SNIPPET procedure and store it in the filtered format when you use the MARKUP or HIGHLIGHT procedure.

You should use the Save Copy feature of Oracle Text to store the copies of the documents in the $D table. Implement the feature by using the save_copy attribute or the save_copy column parameter.

  • save_copy basic storage attribute:

    The following example sets the save_copy attribute value of the BASIC_STORAGE storage type to PLAINTEXT. This example enables Oracle Text to save a copy of the text document in the $D table while it searches for a word in that document.

    exec ctx_ddl.create_preference('mystore', 'BASIC_STORAGE');
    exec ctx_ddl.set_attribute('mystore','save_copy','PLAINTEXT');
    
  • save_copy column index parameter:

    The following example uses the save_copy column index parameter to save a copy of a text document into the $D table. The create index statement creates the $D table and copies document 1 ( "hello world") into the $D table.

    create table docs(
      id       number,
      txt      varchar2(64),
      save     varchar2(10)
    );
    
    insert into docs values(1, 'hello world', 'PLAINTEXT');
    
    create index idx on docs(txt) indextype is ctxsys.context
        parameters('save_copy column save');
    

For the save_copy attribute or column parameter, you can specify one of the following values:

  • PLAINTEXT saves the copy of the document in a plain-text format in the $D index table. The plain-text format is defined as the output format of the sectioner. Specify this value when you use the SNIPPET procedure.

  • FILTERED saves a copy of a document in a filtered format in the $D index table. The filtered format is defined as the output format of the filter. Specify this value when you use the MARKUP or HIGHLIGHT procedure.

  • NONE does not save the copy of the document in the $D index table. Specify this value when you do not use the SNIPPET, MARKUP, or HIGHLIGHT procedure and when the indexed column is either VARCHAR2 or CLOB.

3.6.4 Forward Index Without Save Copy

In the following scenarios, you can take advantage of the performance enhancement of forward index without saving copies of all documents in the $D table (that is, without using the Save Copy feature):

  • The document set contains HTML and plain text: Store all documents in the base table by using the DIRECT_DATASTORE or the MULTI_COLUMN_DATASTORE datastore type.

  • The document set contains HTML, plain text, and binary: Store all documents in the base table by using the DIRECT_DATASTORE datastore type. Store only the binary documents in the $D table in the filtered format.

3.6.5 Save Copy Without Forward Index

Even if you do not enable the forward index feature, the Save Copy feature improves the performance of the following procedures of the CTX_DOC package:

  • CTX_DOC.FILTER

  • CTX_DOC.GIST

  • CTX_DOC.THEMES

  • CTX_DOC.TOKENS