4.5 Creating a CONTEXT Index

The CONTEXT index type is well suited for indexing large, coherent documents in formats such as Microsoft Word, HTML, or plain text.

With a CONTEXT index, you can also customize your index in a variety of ways. The documents must be loaded in a text table.

4.5.1 CONTEXT Index and DML

A CONTEXT index is not transactional. When you delete a record, the index is changed immediately. That is, your session no longer finds the record from the moment you make the change, and other users cannot find the record after you commit. For inserts and updates, the new information is not visible to text searches until an index synchronization has occurred. Therefore, when you perform inserts or updates on the base table, you must explicitly synchronize the index with CTX_DDL.SYNC_INDEX.

4.5.2 Default CONTEXT Index Example

The following statement creates a default CONTEXT index called myindex on the text column in the docs table:

CREATE INDEX myindex ON docs(text) INDEXTYPE IS CTXSYS.CONTEXT;

When you use the CREATE INDEX statement without explicitly specifying parameters, the system completes the following actions by default for all languages:

  • Assumes that the text to be indexed is stored directly in a text column. The text column can be of type CLOB, BLOB, BFILE, VARCHAR2, or CHAR.

  • Detects the column type and uses filtering for the binary column types of BLOB and BFILE. Most document formats are supported for filtering. If your column is plain text, the system does not use filtering.

    Note:

    For document filtering to work correctly in your system, you must ensure that your environment is set up correctly to support the AUTO_FILTER filter.

  • Assumes that the language of the text to index is the language specified in your database setup.

  • Uses the default stoplist for the language specified in your database setup. Stoplists identify the words that the system ignores during indexing.

  • Enables fuzzy and stemming queries for your language, if this feature is available for your language.

You can always change the default indexing behavior by customizing your preferences and specifying those preferences in the parameter string of CREATE INDEX.

See Also:

Oracle Text Reference to learn more about configuring your environment to use the AUTO_FILTER filter

4.5.3 Incrementally Creating a CONTEXT Index

The ALTER INDEX and CREATE INDEX statements support incrementally creating a CONTEXT index.

You can incrementally create Oracle Text indexes, which means that the index structure is immediately created but the data is not populated during the index creation or rebuild process. You populate the index later at a suitable time. This procedure is useful for creating indexes in large installations that cannot afford to have the indexing process running continuously. It provides finer control over the creation of indexes, allowing you to avoid building indexes in a single operation.

Incremental index creation involves the following steps:

  1. Create an empty index:

    If you specify the NOPOPULATE keyword at the time of index creation or rebuild, it only creates metadata for the index tables but does not populate them.

    • Global index:

      For a global index, use CREATE INDEX to support the NOPOPULATE keyword in the REPLACE parameter of the REBUILD clause.

    • Local index partition:

      For a local index partition, modify the ALTER INDEX ... REBUILD partition ... parameters ('REPLACE ...') parameter string to support the NOPOPULATE keyword.

      For a partition on a local index, CREATE INDEX ... LOCAL ... (partition ... parameters ('NOPOPULATE')) is supported. The partition-level POPULATE or NOPOPULATE keywords override any POPULATE or NOPOPULATE specified at the index level.

  2. Place all ROWIDs into the pending queue:

    Use the CTX_DDL.POPULATE_PENDING procedure to populate the pending queues with every ROWID in the base table or table partition.

  3. Populate the index:

    Use the CTX_DDL.SYNC_INDEX procedure to populate the index with the queued data.

    The SYNC_INDEX procedure includes the maxtime argument that indicates a suggested time limit in minutes for the operation. The indexing process runs in an estimate of the given maxtime instead of running to completion. You might need to run multiple SYNC_INDEX calls until the index is fully synced.

    You can choose to run both the POPULATE_PENDING and SYNC_INDEX calls separately so that the population of the pending queue and the population of the index happen at different times, thereby optimizing system performance.

Example 4-1 Incrementally Build an Empty Global Index

-- Create an empty index

CREATE INDEX ctx_ind ON ctx_tab(doc) INDEXTYPE IS CTXSYS.CONTEXT
   PARAMETERS ('NOPOPULATE');
declare
 n_pending number;

 function get_pending return number is
   n_pending number;

 begin
   n_pending := 0;
   begin
     execute immediate 'SELECT COUNT(*) FROM DR$CTX_IND$C' into n_pending;
   exception when others then
     if (sqlcode != -942) then
       raise;
     end if;
   end;

   if (n_pending = 0) then
     execute immediate 'SELECT COUNT(*) FROM CTX_USER_PENDING WHERE PND_INDEX_NAME = :1'
         into n_pending using 'CTX_IND';
   end if;
   return n_pending;
 end get_pending;

begin
 -- Fill in the pending queue
 CTX_DDL.POPULATE_PENDING('CTX_IND');
 n_pending := get_pending;
 while (n_pending > 0) loop
  -- Populate the index through sync_index
   CTX_DDL.SYNC_INDEX('CTX_IND', maxtime => 1);
   n_pending := get_pending;
 end loop;
end;
/

4.5.4 Custom CONTEXT Index Example: Indexing HTML Documents

To index an HTML document set located by URLs, specify the system-defined preference for the NULL_FILTER in the CREATE INDEX statement.

You can also specify your htmgroup section group that uses HTML_SECTION_GROUP and NETWORK_PREF datastore that uses NETWORK_DATASTORE:

begin
 ctx_ddl.create_preference('NETWORK_PREF','NETWORK_DATASTORE');
 ctx_ddl.set_attribute('NETWORK_PREF','HTTP_PROXY','www-proxy.us.example.com');
 ctx_ddl.set_attribute('NETWORK_PREF','NO_PROXY','us.example.com');
 ctx_ddl.set_attribute('NETWORK_PREF','TIMEOUT','300');
end;

begin
ctx_ddl.create_section_group('htmgroup', 'HTML_SECTION_GROUP');
ctx_ddl.add_zone_section('htmgroup', 'heading', 'H1');
end;

You can then index your documents:

CREATE INDEX myindex on docs(htmlfile) indextype is ctxsys.context 
parameters(
'datastore NETWORK_PREF filter ctxsys.null_filter section group htmgroup'
);

Note:

Starting with Oracle Database 19c, the Oracle Text type URL_DATASTORE is deprecated. Use NETWORK_DATASTORE instead.

Related Topics

4.5.5 CONTEXT Index Example: Query Processing with FILTER BY and ORDER BY

To enable more efficient query processing and better response time for mixed queries, use FILTER BY and ORDER BY clauses as shown in the following example:

CREATE INDEX myindex on docs(text) INDEXTYPE is CTXSYS.CONTEXT
FILTER BY category, publisher, pub_date
ORDER BY pub_date desc;

Because you specified the FILTER BY category, publisher, pub_date clause at query time, Oracle Text also considers pushing a relational predicate on any of these columns into the Oracle Text index row source.

Also, when the query has matching ORDER BY criteria, by specifying ORDER BY pub_date desc, Oracle Text determines whether to push SORT into the Oracle Text index row source for better response time.