3.6 Improved Document Services Performance with a Forward Index
When it searches for a word in a document, Oracle Text uses an inverted index and then displays the results by calculating the snippet from that document. For calculating the snippet, each document returned as part of the search result is reindexed. The search operation slows down considerably when a document’s size is very large.
The forward index overcomes the performance problem of very large documents. It uses a $O
mapping table that refers to the token offsets in the $I
inverted index table. Each token offset is translated into the character offset in the original document, and the text surrounding the character offset is then used to generate the text snippet.
Because the forward index does not use in-memory indexing of the documents while calculating the snippet, it provides considerable improved performance over the inverted index while searching for a word in very large documents.
The forward index improves the performance of the following procedures in the Oracle Text CTX_DOC
package:
-
CTX_DOC.SNIPPET
-
CTX_DOC.HIGHLIGHT
-
CTX_DOC.MARKUP
See Also:
Oracle Text Reference for information about the forward_index
parameter clause of the BASIC_STORAGE
indexing type
3.6.1 Enabling Forward Index
The following example enables the forward index feature by setting the forward_index
attribute value of the BASIC_STORAGE
storage type to TRUE:
exec ctx_ddl.create_preference('mystore', 'BASIC_STORAGE'); exec ctx_ddl.set_attribute('mystore','forward_index','TRUE');
3.6.2 Forward Index with Snippets
In some cases, when you use the forward_index
option, generated snippets may be slightly different from the snippets that are generated when you do not use the forward_index
option. The differences are generally minimal, do not affect snippet quality, and are typically "few extra white spaces" and "newline."
3.6.3 Forward Index with Save Copy
Using Forward Index with Save Copy
To use the forward index effectively, you should store copies of the documents in the $D
table, either in plain-text format or filtered format, depending upon the CTX_DOC
package procedure that you use. For example, store the document in plain-text when you use the SNIPPET
procedure and store it in the filtered format when you use the MARKUP
or HIGHLIGHT
procedure.
You should use the Save Copy feature of Oracle Text to store the copies of the documents in the $D
table. Implement the feature by using the save_copy
attribute or the save_copy
column parameter.
-
save_copy
basic storage attribute:The following example sets the
save_copy
attribute value of theBASIC_STORAGE
storage type toPLAINTEXT.
This example enables Oracle Text to save a copy of the text document in the$D
table while it searches for a word in that document.exec ctx_ddl.create_preference('mystore', 'BASIC_STORAGE'); exec ctx_ddl.set_attribute('mystore','save_copy','PLAINTEXT');
-
save_copy column
index parameter:The following example uses the
save_copy column
index parameter to save a copy of a text document into the$D
table. Thecreate index
statement creates the$D
table and copies document 1 ( "hello world") into the$D
table.create table docs( id number, txt varchar2(64), save varchar2(10) ); insert into docs values(1, 'hello world', 'PLAINTEXT'); create index idx on docs(txt) indextype is ctxsys.context parameters('save_copy column save');
For the save_copy
attribute or column parameter, you can specify one of the following values:
-
PLAINTEXT
saves the copy of the document in a plain-text format in the$D
index table. The plain-text format is defined as the output format of the sectioner. Specify this value when you use theSNIPPET
procedure. -
FILTERED
saves a copy of a document in a filtered format in the$D
index table. The filtered format is defined as the output format of the filter. Specify this value when you use theMARKUP
orHIGHLIGHT
procedure. -
NONE
does not save the copy of the document in the$D
index table. Specify this value when you do not use theSNIPPET, MARKUP,
orHIGHLIGHT
procedure and when the indexed column is eitherVARCHAR2
orCLOB.
3.6.4 Forward Index Without Save Copy
In the following scenarios, you can take advantage of the performance enhancement of forward index without saving copies of all documents in the $D
table (that is, without using the Save Copy feature):
-
The document set contains HTML and plain text: Store all documents in the base table by using the
DIRECT_DATASTORE
or theMULTI_COLUMN_DATASTORE
datastore type. -
The document set contains HTML, plain text, and binary: Store all documents in the base table by using the
DIRECT_DATASTORE
datastore type. Store only the binary documents in the$D
table in the filtered format.