10.10 Frequently Asked Questions About Indexing Performance
This section answers some of the frequently asked questions about indexing performance.
10.10.1 How long should indexing take?
Answer: Indexing text is a resource-intensive process. The speed of indexing depends on the power of your hardware. Indexing speed depends on CPU and I/O capacity. With sufficient I/O capacity to read in the original data and write out index entries, the CPU is the limiting factor.
Tests with Intel x86 (Core 2 architecture, 2.5GHz) CPUs have shown that Oracle Text can index around 100 GB of text per CPU core, per day. This speed would be expected to increase as CPU clock speeds increase and CPU architectures become more efficient.
Other factors, such as your document format, location of your data, and the calls to user-defined datastores, filters, and lexers, can affect your indexing speed.
10.10.2 Which index memory settings should I use?
Answer: You can set your index memory with the DEFAULT_INDEX_MEMORY
and MAX_INDEX_MEMORY
system parameters. You can also set your index memory at runtime with the CREATE INDEX
memory
parameter in the parameter string.
You should aim to set the DEFAULT_INDEX_MEMORY
value as high as possible, without causing paging.
You can also improve indexing performance by increasing the SORT_AREA_SIZE
system parameter.
Oracle recommends that you use a large index memory setting. Large settings, even up to hundreds of megabytes, can improve the speed of indexing and reduce fragmentation of the final indexes. However, if you set the index memory setting too high, then memory paging reduces indexing speed.
With parallel indexing, each stream requires its own index memory. When dealing with very large tables, you can tune your database system global area (SGA) differently for indexing and retrieval. For querying, you want to get as much information cached in the SGA block buffer cache as possible. So you should allocate a large amount of memory to the block buffer cache. Because this approach does not make any difference to indexing, you would be better off reducing the size of the SGA to make more room for large index memory settings during indexing.
You set the size of SGA in your Oracle Database initialization file.
See Also:
-
Oracle Text Reference to learn more about Oracle Text system parameters
-
Oracle Database Administrator's Guide for more information on setting SGA related parameters
-
Oracle Database Performance Tuning Guide for more information on memory allocation
-
Oracle Database Reference for more information on setting the
SORT_AREA_SIZE
parameter
10.10.3 How much disk overhead will indexing require?
Answer: The overhead, the amount of space needed for the index tables, varies between about 50 and 200 percent of the original text volume. Generally, larger amounts of text result in smaller overhead, but many small records use more overhead than fewer large records. Also, clean data (such as published text) requires less overhead than dirty data such as emails or discussion notes, because the dirty data is likely to include many misspelled and abbreviated words.
A text-only index is smaller than a combined text and theme index. A prefix and substring index makes the index significantly larger.
10.10.4 How does the format of my data affect indexing?
Answer: You can expect much lower storage overhead for formatted documents such as Microsoft Word files because the documents tend to be very large compared to the actual text held in them. So 1 GB of Word documents might only require 50 MB of index space, whereas 1 GB of plain text might require 500 MB, because there is ten times as much plain text in the latter set.
Indexing time is less clear-cut. Although the reduction in the amount of text to be indexed has an obvious effect, you must balance this against the cost of filtering the documents with the AUTO_FILTER
filter or other user-defined filters.
10.10.5 Can parallel indexing improve performance?
Answer: Parallel indexing can improve index performance when you have a large amount of data and multiple CPUs.
Use the PARALLEL
keyword to create an index with up to three separate indexing processes, depending on your resources.
CREATE INDEX index_name ON table_name (column_name) INDEXTYPE IS ctxsys.context PARAMETERS ('...') PARALLEL 3;
You can also use parallel indexing to create local partitioned indexes on partitioned tables. However, indexing performance improves only with multiple CPUs.
Note:
Using PARALLEL
to create a local partitioned index enables parallel queries. (Creating a nonpartitioned index in parallel does not turn on parallel query processing.)
Parallel querying degrades query throughput especially on heavily loaded systems. Because of this, Oracle recommends that you disable parallel querying after parallel indexing. To do so, use ALTER INDEX NOPARALLEL.
10.10.6 How can I improve index performance when I create a local partitioned index?
Answer: When you have multiple CPUs, you can improve indexing performance by creating a local index in parallel.
You can create a local partitioned index in parallel in the following ways:
-
Use the
PARALLEL
clause with theLOCAL
clause in theCREATE INDEX
statement. In this case, the maximum parallel degree is limited to the number of partitions. -
Create an unusable index, and then run the
DBMS_PCLXUTIL.BUILD_PART_INDEX
utility. This method can result in a higher degree of parallelism, especially if you have more CPUs than partitions.
The following is an example of the second method. The base table has three partitions. You create a local partitioned unusable index first, and then run the DBMS_PCLUTIL.BUILD_PART_INDEX
, to build the three partitions in parallel (inter-partition parallelism). Inside each partition, index creation occurs in parallel (intra-partition parallelism) with a parallel degree of 2.
create index tdrbip02bx on tdrbip02b(text) indextype is ctxsys.context local (partition tdrbip02bx1, partition tdrbip02bx2, partition tdrbip02bx3) unusable; exec dbms_pclxutil.build_part_index(3,2,'TDRBIP02B','TDRBIP02BX',TRUE);
10.10.7 How can I tell how much indexing has completed?
Answer: You can use the CTX_OUTPUT.START_LOG
procedure to log output from the indexing process. The filename is normally written to $ORACLE_HOME/ctx/log,
but you can change the directory by using the LOG_DIRECTORY
parameter in CTX_ADM.SET_PARAMETER.
See Also:
Oracle Text Reference to learn more about the CTX_OUTPUT
package