4.5 Creating a CONTEXT Index
The CONTEXT
index type is well suited for indexing large, coherent documents in formats such as Microsoft Word, HTML, or plain text.
With a CONTEXT
index, you can also customize your index in a variety of ways. The documents must be loaded in a text table.
4.5.1 CONTEXT Index and DML
A CONTEXT
index is not transactional. When you delete a record, the index is changed immediately. That is, your session no longer finds the record from the moment you make the change, and other users cannot find the record after you commit. For inserts and updates, the new information is not visible to text searches until an index synchronization has occurred. Therefore, when you perform inserts or updates on the base table, you must explicitly synchronize the index with CTX_DDL.SYNC_INDEX.
See Also:
4.5.2 Default CONTEXT Index Example
The following statement creates a default CONTEXT
index called myindex
on the text
column in the docs
table:
CREATE INDEX myindex ON docs(text) INDEXTYPE IS CTXSYS.CONTEXT;
When you use the CREATE INDEX
statement without explicitly specifying parameters, the system completes the following actions by default for all languages:
-
Assumes that the text to be indexed is stored directly in a text column. The text column can be of type
CLOB,
BLOB,
BFILE,
VARCHAR2,
orCHAR.
-
Detects the column type and uses filtering for the binary column types of
BLOB
andBFILE.
Most document formats are supported for filtering. If your column is plain text, the system does not use filtering. -
Assumes that the language of the text to index is the language specified in your database setup.
-
Uses the default stoplist for the language specified in your database setup. Stoplists identify the words that the system ignores during indexing.
-
Enables fuzzy and stemming queries for your language, if this feature is available for your language.
You can always change the default indexing behavior by customizing your preferences and specifying those preferences in the parameter string of CREATE
INDEX.
See Also:
Oracle Text Reference to learn more about configuring your environment to use the AUTO_FILTER
filter
4.5.3 Incrementally Creating a CONTEXT Index
The ALTER INDEX
and CREATE INDEX
statements support incrementally creating a CONTEXT
index.
You can incrementally create Oracle Text indexes, which means that the index structure is immediately created but the data is not populated during the index creation or rebuild process. You populate the index later at a suitable time. This procedure is useful for creating indexes in large installations that cannot afford to have the indexing process running continuously. It provides finer control over the creation of indexes, allowing you to avoid building indexes in a single operation.
Incremental index creation involves the following steps:
-
Create an empty index:
If you specify the
NOPOPULATE
keyword at the time of index creation or rebuild, it only creates metadata for the index tables but does not populate them.-
Global index:
For a global index, use
CREATE
INDEX
to support theNOPOPULATE
keyword in theREPLACE
parameter of theREBUILD
clause. -
Local index partition:
For a local index partition, modify the
ALTER
INDEX
...REBUILD
partition
...parameters
('REPLACE
...') parameter string to support theNOPOPULATE
keyword.For a partition on a local index,
CREATE
INDEX
...LOCAL
... (partition
...parameters
('NOPOPULATE
')) is supported. The partition-levelPOPULATE
orNOPOPULATE
keywords override anyPOPULATE
orNOPOPULATE
specified at the index level.
-
-
Place all ROWIDs into the pending queue:
Use the
CTX_DDL.POPULATE_PENDING
procedure to populate the pending queues with every ROWID in the base table or table partition. -
Populate the index:
Use the
CTX_DDL.SYNC_INDEX
procedure to populate the index with the queued data.The
SYNC_INDEX
procedure includes themaxtime
argument that indicates a suggested time limit in minutes for the operation. The indexing process runs in an estimate of the givenmaxtime
instead of running to completion. You might need to run multipleSYNC_INDEX
calls until the index is fully synced.You can choose to run both the
POPULATE_PENDING
andSYNC_INDEX
calls separately so that the population of the pending queue and the population of the index happen at different times, thereby optimizing system performance.
Example 4-1 Incrementally Build an Empty Global Index
-- Create an empty index
CREATE INDEX ctx_ind ON ctx_tab(doc) INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('NOPOPULATE');
declare
n_pending number;
function get_pending return number is
n_pending number;
begin
n_pending := 0;
begin
execute immediate 'SELECT COUNT(*) FROM DR$CTX_IND$C' into n_pending;
exception when others then
if (sqlcode != -942) then
raise;
end if;
end;
if (n_pending = 0) then
execute immediate 'SELECT COUNT(*) FROM CTX_USER_PENDING WHERE PND_INDEX_NAME = :1'
into n_pending using 'CTX_IND';
end if;
return n_pending;
end get_pending;
begin
-- Fill in the pending queue
CTX_DDL.POPULATE_PENDING('CTX_IND');
n_pending := get_pending;
while (n_pending > 0) loop
-- Populate the index through sync_index
CTX_DDL.SYNC_INDEX('CTX_IND', maxtime => 1);
n_pending := get_pending;
end loop;
end;
/
Related Topics
4.5.4 Custom CONTEXT Index Example: Indexing HTML Documents
To index an HTML document set located by URLs, specify the system-defined preference for the NULL_FILTER
in the CREATE INDEX
statement.
You can also specify your htmgroup
section group that uses HTML_SECTION_GROUP
and NETWORK_PREF
datastore that uses NETWORK_DATASTORE
:
begin ctx_ddl.create_preference('NETWORK_PREF','NETWORK_DATASTORE'); ctx_ddl.set_attribute('NETWORK_PREF','HTTP_PROXY','www-proxy.us.example.com'); ctx_ddl.set_attribute('NETWORK_PREF','NO_PROXY','us.example.com'); ctx_ddl.set_attribute('NETWORK_PREF','TIMEOUT','300'); end; begin ctx_ddl.create_section_group('htmgroup', 'HTML_SECTION_GROUP'); ctx_ddl.add_zone_section('htmgroup', 'heading', 'H1'); end;
You can then index your documents:
CREATE INDEX myindex on docs(htmlfile) indextype is ctxsys.context parameters( 'datastore NETWORK_PREF filter ctxsys.null_filter section group htmgroup' );
Note:
Starting with Oracle Database 19c, the Oracle Text type
URL_DATASTORE
is deprecated. Use
NETWORK_DATASTORE
instead.
Related Topics
4.5.5 CONTEXT Index Example: Query Processing with FILTER BY and ORDER BY
To enable more efficient query processing and better response time for mixed queries, use FILTER
BY
and ORDER
BY
clauses as shown in the following example:
CREATE INDEX myindex on docs(text) INDEXTYPE is CTXSYS.CONTEXT FILTER BY category, publisher, pub_date ORDER BY pub_date desc;
Because you specified the FILTER
BY
category,
publisher,
pub_date
clause at query time, Oracle Text also considers pushing a relational predicate on any of these columns into the Oracle Text index row source.
Also, when the query has matching ORDER
BY
criteria, by specifying ORDER
BY
pub_date desc,
Oracle Text determines whether to push SORT
into the Oracle Text index row source for better response time.