3.1 About Oracle Text Indexes
The discussion of Oracle Text indexes includes the different types of indexes, their structure, the indexing process, and limitations.
The following topics provide information about Oracle Text indexes:
3.1.1 Types of Oracle Text Indexes
With Oracle Text, you create indexes by using the CREATE
INDEX
statement.
Table 3-1 Oracle Text Index Types
Index Type | Description | Supported Preferences and Parameters | Query Operator | Notes |
---|---|---|---|---|
|
Use this index to build a text retrieval application when your text consists of large, coherent documents in, for example, MS Word, HTML, or plain text. You can customize the index in a variety of ways. This index type requires |
All Supported parameters: index partition clause format, charset, and language columns |
The Use the |
Supports all documents services and query services. Supports indexing of partitioned text tables. Supports |
|
Use this index to build a text retrieval application when your text consists of large, coherent documents in, for example, MS Word, HTML, or plain text. You can customize the index in a variety of ways. This index type requires |
All Supported parameters: index partition clause format, charset, and language columns |
The Use the |
Supports all documents services and query services. Supports indexing of partitioned text tables. Supports sharded databases and system managed partitioning for index storage tables. |
|
Use this index for better mixed query performance of small documents and text fragments. To improve mixed query performance, include other columns in the base table, such as item names, prices, and descriptions. This index type is transactional. It automatically updates itself after inserts, updates, or deletes to the base table. Note: The Oracle Text indextype
|
Not supported: Format, charset, and language columns Table and index partitioning |
The Use the Theme querying is supported. |
This index is larger and takes longer to build than a The size of a The |
|
Use this index to build a document classification or routing application. Create this index on a table of queries, where the queries define the classification or routing criteria.. |
|
Use the To build a document classification application by using simple or rule-based classification, create an index of type |
An Oracle Text index is an Oracle Database domain index. To build your query application, you can create an index of type CONTEXT
with a mixture of text and structured data columns, and query it with the CONTAINS
operator.
You create an index from a populated text table. In a query application, the table must contain the text or pointers to the location of the stored text. Text is usually a collection of documents, but it can also be small text fragments.
Note:
If you are building a new application that uses XML data, Oracle recommends that you use XMLIndex,
not CTXRULE.
Create an Oracle Text index as a type of extensible index to Oracle Database by using standard SQL. This means that an Oracle Text index operates like an Oracle Database index. It has a name by which it is referenced and can be manipulated with standard SQL statements.
The benefit of creating an Oracle Text index is fast response time for text queries with the CONTAINS
, CATSEARCH
, and MATCHES
operators. These operators query the CONTEXT
, CTXCAT
, and CTXRULE
index types, respectively.
Note:
Because a Transparent Data Encryption-enabled column does not support domain indexes, do not use it with Oracle Text. However, you can create an Oracle Text index on a column in a table that is stored in a Transparent Data Encryption-enabled tablespace.
See Also:
-
Oracle XML DB Developer's Guide for information about
XMLIndex
and indexingXMLType
data
3.1.2 Structure of the Oracle Text CONTEXT Index
Oracle Text indexes text by converting all words into tokens. The general structure of an Oracle Text CONTEXT
index is an inverted index, where each token contains the list of documents (rows) that contain the token.
For example, after a single initial indexing operation, the word DOG might have an entry as follows:
Word | Appears in Document |
---|---|
DOG |
DOC1 DOC3 DOC5 |
This means that the word DOG is contained in the rows that store documents one, three, and five.
Merged Word and Theme Indexing
By default in English and French, Oracle Text indexes theme information with word information. You can query theme information with the ABOUT
operator. You can also enable and disable theme indexing.
See Also:
"Creating Preferences " to learn more about indexing theme information
3.1.3 Oracle Text Indexing Process
This section describes the Oracle Text indexing process. Initiate the indexing process by using the CREATE
INDEX
statement to create an Oracle Text index of tokens, organized according to your parameters and preferences.
Figure 3-1 shows the indexing process. This process is a data stream that is acted upon by the different indexing objects. Each object corresponds to an indexing preference type or section group that you can specify in the parameter string of CREATE
INDEX
or ALTER
INDEX.
Oracle Text processes the data stream with the following objects and engine:
3.1.3.1 Datastore Object
The stream starts with the datastore reading in the documents as they are stored in the system according to your datastore preference.
For example, if you defined your datastore as DIRECTORY_DATASTORE,
then the stream starts by reading the files from an Oracle directory object. You can also store your documents on the internet or in Oracle Database. Wherever your files reside physically, a text table in Oracle Database must always point to the files.
3.1.3.2 Filter Object
The stream then passes through the filter. Your FILTER
preference determines what happens. The stream can be acted upon in one of the following ways:
-
No filtering takes place when you specify the
NULL_FILTER
preference type or when the value of the format column isIGNORE.
Documents that are plain text, HTML, or XML need no filtering. -
Formatted documents (binary) are filtered to marked-up text when you specify the
AUTO_FILTER
preference type or when the value of the format column isBINARY.
3.1.3.3 Sectioner Object
After being filtered, the marked-up text passes through the sectioner, which separates the stream into text and section information. Section information includes where sections begin and end in the text stream. The type of sections that are extracted is determined by your section group type.
The text is passed to the lexer. The section information is passed directly to the indexing engine, which uses it later.
3.1.3.4 Lexer Object
You create a lexer preference by using one of the Oracle Text lexer types to specify the language of the text to be indexed. The lexer breaks the text into tokens according to your language. These tokens are usually words. To extract tokens, the lexer uses the parameters that are defined in your lexer preference. These parameters include the definitions for the characters that separate tokens, such as whitespace. Parameters also include whether to convert the text to all uppercase or to leave it in mixed case.
When you enable theme indexing, the lexer analyzes your text to create theme tokens for indexing.
3.1.3.5 Indexing Engine
The indexing engine creates the inverted index that maps tokens to the documents that contain them. In this phase, Oracle Text uses the stoplist that you specify to exclude stopwords or stopthemes from the index. Oracle Text also uses the parameters that are defined in your WORDLIST
preference. Those parameters tell the system how to create a prefix index or substring index, if enabled.
3.1.4 About Updates to Indexed Columns
You can keep documents available for search operations until the index is synchronized, without immediately performing index synchronization.
In releases prior to Oracle Database 12c Release 2 (12.2), when there is an update to the column on which an Oracle Text index is based, the document is unavailable for search operations until the index is synchronized. User queries cannot perform a search of this document. Starting with Oracle Database 12c Release 2 (12.2), you can specify that documents must be searchable after updates, without immediately performing index synchronization. Before the index is synchronized, queries use the old index entries to fetch the contents of the old document. After index synchronization, user queries fetch the contents of the updated document.
The ASYNCHRONOUS_UPDATE
option for indexes enables you to retain the old contents of a document after an update and then use this index to answer user queries.
Note:
The ASYNCHRONOUS_UPDATE
setting of the CONTEXT
indextype is deprecated in Oracle Database 23ai, and can be ignored or removed in a future release.
Oracle can ignore or remove this attribute in a future release. Oracle recommends that you allow this value to be set to its default value, SYNCHRONOUS_UPDATE
. To avoid unexpected loss of results during updates, use SYNC (ON COMMIT)
or SYNC(EVERY [time-period])
with a short time period.
The ASYNCHRONOUS_UPDATE
setting was introduced as a workaround for the fact that updates are implemented as "delete followed by insert," and that deletes are immediate (on commit), while inserts are only performed during an index sync. However, this setting is incompatible with several other index options. Oracle recommends that you discontinue its use.
Related Topics
3.1.5 Partitioned Tables and Indexes
When you create a partitioned CONTEXT
index on a partitioned text table, you must partition the table by range. Hash, composite, and list partitions are not supported.
You can create a partitioned text table to partition your data by date. For example, if your application maintains a large library of dated news articles, you can partition your information by month or year. Partitioning simplifies the manageability of large databases, because querying, insert, update, delete operations, and backup and recovery can act on a single partition.
On local CONTEXT
indexes with multiple table sets, Oracle Text supports the number of partitions supported by Oracle Database.
Note:
The number of partitions that are supported in Oracle Text is approximately 1024K-1. This limit, which should be more than adequate, is not applicable to a CONTEXT
index on partitioned tables.
See Also:
Oracle Database Concepts for more information about partitioning
To query a partitioned table, use CONTAINS
in the WHERE
clause of a SELECT
statement as you query a regular table. You can query the entire table or a single partition. However, if you are using the ORDER
BY
SCORE
clause, Oracle recommends that you query single partitions unless you include a range predicate that limits the query to a single partition.
3.1.6 Online Indexes
When it is not practical to lock your base table for indexing because of ongoing updates, you can create your index online with the ONLINE
parameter of CREATE INDEX
statement. This way an application with frequent inserts, updates, or deletes does not have to stop updating the base table for indexing.
There are short periods, however, when the base table is locked at the beginning and end of the indexing process.
See Also:
Oracle Text Reference to learn more about creating an index online
3.1.7 Parallel Indexing
Oracle Text supports parallel indexing with the CREATE
INDEX
statement.
When you enter a parallel indexing statement on a nonpartitioned table, Oracle Text splits the base table into temporary partitions, spawns child processes, and assigns a child to a partition. Each child then indexes the rows in its partition. The method of slicing the base table into partitions is determined by Oracle and is not under your direct control. This is true as well for the number of child processes actually spawned, which depends on machine capabilities, system load, your init.ora
settings, and other factors. Because of these variables, the actual parallel degree may not match the degree of parallelism requested.
Because indexing is an intensive I/O operation, parallel indexing is most effective in decreasing your indexing time when you have distributed disk access and multiple CPUs. Parallel indexing can affect the performance of an initial index only with the CREATE
INDEX
statement. It does not affect insert, update, and delete operations with ALTER
INDEX,
and has minimal effect on query performance.
Because parallel indexing decreases the initial indexing time, it is useful for the following scenarios:
3.1.8 Indexing and Views
If you want to index documents that have contents in different tables, then create a ubiquitous search index.
You can use the DBMS_SEARCH PL/SQL
package to create a ubiquitous search index on multiple tables and views within a schema. You can create this index only on the views that have a primary key and a foreign key constraint relationship with the component table.
Alternatively, you can create a data storage preference by using the USER_DATASTORE
object. With this object, you can define a procedure that synthesizes documents from different tables at index time.
Oracle Text supports the creation of CONTEXT
, CTXCAT
, and CTXRULE
indexes on materialized views (MVIEW
).