Frequently Asked Questions About Query Performance

10.9.1 What is query performance?

Answer: There are two measures of query performance:

Response time: The time to get an answer to an individual query
Throughput: The number of queries that can be run in any given time period; for example, queries each second

These two measures are related, but they are not the same. In a heavily loaded system, you want maximum throughput, whereas in a relatively lightly loaded system, you probably want minimum response time. Also, some applications require a query to deliver all hits to the user, whereas others only require the first 20 hits from an ordered set. It is important to distinguish between these two scenarios.

10.9.2 What is the fastest type of Oracle Text query?

Answer: The fastest type of query meets the following conditions:

Single CONTAINS clause
No other conditions in the WHERE clause
No ORDER BY clause
Returns only the first page of results (for example, the first 10 or 20 hits)

10.9.3 Should I collect statistics on my tables?

Answer: Yes. Collecting statistics on your tables enables Oracle Text to do cost-based analysis. This helps Oracle Text choose the most efficient execution plan for your queries.

If your queries are always pure text queries (no structured predicate and no joins), you should delete statistics on your Oracle Text index.

10.9.4 How does the size of my data affect queries?

Answer: The speed at which the Oracle Text index can deliver rowids is not affected by the actual size of the data. Oracle Text query speed is related to the number of rows that must be fetched from the index table, the number of hits requested, the number of hits produced by the query, and the presence or absence of sorting.

10.9.5 How does the format of my data affect queries?

Answer: The format of the documents (plain ASCII text, HTML, or Microsoft Word) should make no difference to query speed. The documents are filtered to plain text at indexing time, not query time.

The cleanliness of the data makes a difference. Spell-checked and subedited text for publication tends to have a much smaller total vocabulary (and therefore size of the index table) than informal text such as email, which contains spelling errors and abbreviations. For a given index memory setting, the extra text takes up memory, creates more fragmented rows, and adversely affects query response time.

10.9.6 What is the difference between an indexed lookup and a functional lookup

Answer: The kernel can query the Oracle Text index with an indexed lookup and a functional lookup. In the indexed lookup, the first and most common case, the kernel asks the Oracle Text index for all rowids that satisfy a particular text search. These rowids are returned in batches.

In the functional lookup, the kernel passes individual rowids to the Oracle Text index and asks whether that particular rowid satisfies a certain text criterion. The functional lookup is most commonly used with a very selective structured clause, so that only a few rowids must be checked against the Oracle Text index. Here is an example of a search where a functional lookup is useful:

SELECT ID, SCORE(1), TEXT FROM MYTABLE

WHERE START_DATE = '21 Oct 1992'         <- highly selective
AND CONTAINS (TEXT, 'commonword') > 0    <- unselective

Functional invocation is also used for an Oracle Text query that is ordered by a structured column (for example date, price) and if the Oracle Text query contains unselective words.

10.9.7 What tables are involved in queries?

Answer: All queries look at the index token table. The table’s name has the form of DR$indexname$I and contains the list of tokens (TOKEN_TEXT column) and the information about the row and word positions where the token occurs (TOKEN_INFO column).

The row information is stored as internal docid values that must be translated into external rowid values. The table that you use depends on the type of lookup:

For functional lookups, use the $K table, DR$indexname$K. This simple Index Organized Table (IOT) contains a row for each docid/rowid pair.
For indexed lookups, use the $R table, DR$indexname$R. This table holds the complete list of rowids in a BLOB column.

Starting with Oracle Database 12c Release 2 (12.2), a new storage attribute, SMALL_R_ROW, was introduced to reduce the size of the $R row. It populates $R rows on demand instead of creating 22 static rows, thereby reducing the Data Manipulation Language contention. The contention happens when parallel insert, update, and delete operations try to lock the same $R row.

You can easily find out whether a functional or indexed lookup is being used by examining a SQL trace and looking for the $K or $R tables.

Note:

These internal index tables are subject to change from release to release. Oracle recommends that you do not directly access these tables in your application.

10.9.9 Does sorting the results slow a text-only query?

Answer: Yes, it certainly does.

If Oracle Text does not sort, then it can return results as it finds them. This approach is quicker when the application needs to display only a page of results at a time.

10.9.10 How do I make an ORDER BY score query faster?

Answer: Sorting by relevance (SCORE(n)) can be fast if you use the FIRST_ROWS(n) hint. In this case, Oracle Text performs a high-speed internal sort when fetching from the Oracle Text index tables.

Here is an example of this query:

              SELECT /*+ FIRST_ROWS(10) */ ID, SCORE(1), TEXT FROM mytable
                WHERE CONTAINS (TEXT, 'searchterm', 1) > 0
                ORDER BY SCORE(1) DESC;

It is important to note that, there must be no other criteria in the WHERE clause, other than a single CONTAINS.

10.9.11 Which memory settings affect querying?

Answer: For querying, you want to strive for a large system global area (SGA). You can set these SGA parameters in your Oracle Database initialization file. You can also set these parameters dynamically.

The SORT_AREA_SIZE parameter controls the memory that is available for sorting ORDER BY queries. You should increase the size of this parameter if you frequently order by structured columns.

10.9.12 Does out-of-line LOB storage of wide base table columns improve performance?

Answer: Yes. Typically, a SELECT statement selects more than one column from your base table. Because Oracle Text fetches columns to memory, it is more efficient to store wide base table columns such as large objects (LOBs) out of line, especially when these columns are rarely updated but frequently selected.

When LOBs are stored out of line, only the LOB locators need to be fetched to memory during querying. Out-of-line storage reduces the effective size of the base table. It makes it easier for Oracle Text to cache the entire table to memory, and so reduces the cost of selecting columns from the base table, and speeds up text queries.

In addition, smaller base tables cached in memory enables more index table data to be cached during querying, which improves performance.

10.9.13 How can I speed up a CONTAINS query on more than one column?

Answer: The fastest type of query is one where there is only a single CONTAINS clause and no other conditions in the WHERE clause.

Consider the following multiple CONTAINS query:

              SELECT title, isbn FROM booklist
                WHERE CONTAINS (title, 'horse') > 0
                  AND CONTAINS (abstract, 'racing') > 0

You can get the same result with section searching and the WITHIN operator:

              SELECT title, isbn FROM booklist
                WHERE CONTAINS (alltext, 
                  'horse WITHIN title AND racing WITHIN abstract')>0

This query is completed more quickly than the single CONTAINS clause. To use a query like this, you must copy all data into a single text column for indexing, with section tags around each column's data. You can do that with PL/SQL procedures before indexing, or you can use the USER_DATASTORE datastore during indexing to synthesize structured columns with the text column into one document.

10.9.14 Can I have many expansions in a query?

Answer: Each distinct word used in a query requires at least one row to be fetched from the index table. It is therefore best to keep the number of expansions down as much as possible.

You should not use expansions such as wild cards, thesaurus, stemming, and fuzzy matching unless they are necessary to the task. In general, a few expansions (for example, 10 to 20) does not cause difficulty, but avoid a large number of expansions (80 or 100) in a query. Use the query feedback mechanism to determine the number of expansions for any particular query expression.

For wildcard and stem queries, you can avoid term expansion from query time to index time by creating prefix, substring, or stem indexes. Query performance increases at the cost of longer indexing time and added disk space.

Prefix and substring indexes can improve wildcard performance. You enable prefix and substring indexing with the BASIC_WORDLIST preference. The following example sets the wordlist preference for prefix and substring indexing. For prefix indexing, it specifies that Oracle Text creates token prefixes between 3 and 4 characters long:

begin

ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST'); 
ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');
ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH', '3');
ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', '4');
ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');

end

Enable stem indexing with the BASIC_LEXER preference:

begin

ctx_ddl.create_preference('mylex', 'BASIC_LEXER');
ctx_ddl.set_attribute ( 'mylex', 'index_stems', 'ENGLISH');

end;

10.9.15 How can local partition indexes help?

Answer: You can create local partitioned CONTEXT indexes on partitioned tables. This means that, on a partitioned table, each partition has its own set of index tables. Effectively, the results from the multiple indexes are combined as necessary to produce the final result set.

Use the LOCAL keyword to create the index:

CREATE INDEX index_name ON table_name (column_name) 
INDEXTYPE IS ctxsys.context
PARAMETERS ('...')
LOCAL

With partitioned tables and local indexes, you can improve performance of the following types of CONTAINS queries:

Range Search on Partition Key Column: This query restricts the search to a particular range of values on a column that is also the partition key.
ORDER BY Partition Key Column: This query requires only the first n hits, and the ORDER BY clause names the partition key.

See Also:

"Improved Response Time using Local Partitioned CONTEXT Index"

10.9.16 Should I query in parallel?

Answer: It depends on system load and server capacity. Even though parallel querying is the default behavior for indexes created in parallel, it usually degrades the overall query throughput on heavily loaded systems.

Parallel queries are optimal for Decision Support System (DSS). They are also optimal for analytical systems that have large data collections, multiple CPUs with a low number of concurrent users, or Oracle Real Application Clusters (Oracle RAC) nodes.

Related Topics

Using Parallel Queries

10.9.17 Should I index themes?

Answer: Indexing theme information with a CONTEXT index takes longer and also increases the size of your index. However, theme indexes enable ABOUT queries to be more precise by using the knowledge base. If your application uses many ABOUT queries, it might be worthwhile to create a theme component to the index, despite the extra indexing time and extra storage space required.

10.9.18 When should I use a CTXCAT index?

Answer: CTXCAT indexes work best when the text is in small chunks (just a few lines), and you want searches to restrict or sort the result set according to certain structured criteria, such as numbers or dates.

For example, consider an online auction site. Each item for sale has a short description, a current bid price, and start and end dates for the auction. You want to see all records with antique cabinet in the description, with a current bid price less than $500. Because you are particularly interested in newly posted items, you want the results sorted by auction start time.

This search is not always efficient with a CONTAINS structured query on a CONTEXT index. The response time can vary significantly depending on the structured and CONTAINS clauses, because the intersection of structured and CONTAINS clauses or the Oracle Text query ordering is computed during query time.

By including structured information within the CTXCAT index, you ensure that the query response time is always in an optimal range regardless of search criteria. Because the interaction between text and structured query is precomputed during indexing, query response time is optimum.

Note:

The Oracle Text indextype CTXCAT is deprecated with Oracle Database 23ai. The indextype itself, and it's operator CTXCAT, can be removed in a future release.

Both CTXCAT and the use of CTXCAT grammar as an alternative grammar for CONTEXT queries is deprecated. Instead, Oracle recommends that you use the CONTEXT indextype, which can provide all the same functionality, except that it is not transactional. Near-transactional behavior in CONTEXT can be achieved by using SYNC(ON COMMIT) or, preferably, SYNC(EVERY [time-period]) with a short time period.

CTXCAT was introduced when indexes were typically a few megabytes in size. Modern, large indexes, can be difficult to manage with CTXCAT. The addition of index sets to CTXCAT can be achieved more effectively by the use of FILTER BY and ORDER BY columns, or SDATA, or both, in the CONTEXT indextype. CTXCAT is therefore rarely an appropriate choice. Oracle recommends that you choose the more efficient CONTEXT indextype.

10.9.19 When is a CTXCAT index NOT suitable?

Answer: There are differences in the time and space needed to create the index. CTXCAT indexes take a bit longer to create, and they use considerably more disk space than CONTEXT indexes.

If you are tight on disk space, consider carefully whether CTXCAT indexes are appropriate for you.

With query operators, you can use the richer CONTEXT grammar in CATSEARCH queries with query templates. The older restriction of a single CATSEARCH query grammar no longer holds.

Note:

The Oracle Text indextype CTXCAT is deprecated with Oracle Database 23ai. The indextype itself, and it's operator CTXCAT, can be removed in a future release.

Both CTXCAT and the use of CTXCAT grammar as an alternative grammar for CONTEXT queries is deprecated. Instead, Oracle recommends that you use the CONTEXT indextype, which can provide all the same functionality, except that it is not transactional. Near-transactional behavior in CONTEXT can be achieved by using SYNC(ON COMMIT) or, preferably, SYNC(EVERY [time-period]) with a short time period.

CTXCAT was introduced when indexes were typically a few megabytes in size. Modern, large indexes, can be difficult to manage with CTXCAT. The addition of index sets to CTXCAT can be achieved more effectively by the use of FILTER BY and ORDER BY columns, or SDATA, or both, in the CONTEXT indextype. CTXCAT is therefore rarely an appropriate choice. Oracle recommends that you choose the more efficient CONTEXT indextype.

10.9.20 What optimizer hints are available and what do they do?

Answer: To drive the query with a text or b-tree index, you can use the INDEX(table column) optimizer hint in the usual way.

You can also use the NO_INDEX(table column) hint to disable a specific index.

The FIRST_ROWS(n) hint has a special meaning for text queries. Use it when you need the first n hits to a query. When you use the DOMAIN_INDEX_SORT hint in conjunction with ORDER BY SCORE(n) DESC, you tell the Oracle optimizer to accept a sorted set from the Oracle Text index and to sort no farther.

10.9 Frequently Asked Questions About Query Performance

10.9.1 What is query performance?

10.9.2 What is the fastest type of Oracle Text query?

10.9.3 Should I collect statistics on my tables?

10.9.4 How does the size of my data affect queries?

10.9.5 How does the format of my data affect queries?

10.9.6 What is the difference between an indexed lookup and a functional lookup

10.9.7 What tables are involved in queries?

10.9.8 How is the $R table contention reduced?

10.9.9 Does sorting the results slow a text-only query?

10.9.10 How do I make an ORDER BY score query faster?

10.9.11 Which memory settings affect querying?

10.9.12 Does out-of-line LOB storage of wide base table columns improve performance?

10.9.13 How can I speed up a CONTAINS query on more than one column?

10.9.14 Can I have many expansions in a query?

10.9.15 How can local partition indexes help?

10.9.16 Should I query in parallel?

10.9.17 Should I index themes?

10.9.18 When should I use a CTXCAT index?

10.9.19 When is a CTXCAT index NOT suitable?

10.9.20 What optimizer hints are available and what do they do?