10.9 Frequently Asked Questions About Query Performance
This section answers some of the frequently asked questions about query performance.
10.9.1 What is query performance?
Answer: There are two measures of query performance:
-
Response time: The time to get an answer to an individual query
-
Throughput: The number of queries that can be run in any given time period; for example, queries each second
These two measures are related, but they are not the same. In a heavily loaded system, you want maximum throughput, whereas in a relatively lightly loaded system, you probably want minimum response time. Also, some applications require a query to deliver all hits to the user, whereas others only require the first 20 hits from an ordered set. It is important to distinguish between these two scenarios.
10.9.2 What is the fastest type of Oracle Text query?
Answer: The fastest type of query meets the following conditions:
-
Single
CONTAINS
clause -
No other conditions in the
WHERE
clause -
No
ORDER
BY
clause -
Returns only the first page of results (for example, the first 10 or 20 hits)
10.9.3 Should I collect statistics on my tables?
Answer: Yes. Collecting statistics on your tables enables Oracle Text to do cost-based analysis. This helps Oracle Text choose the most efficient execution plan for your queries.
If your queries are always pure text queries (no structured predicate and no joins), you should delete statistics on your Oracle Text index.
10.9.4 How does the size of my data affect queries?
Answer: The speed at which the Oracle Text index can deliver rowids is not affected by the actual size of the data. Oracle Text query speed is related to the number of rows that must be fetched from the index table, the number of hits requested, the number of hits produced by the query, and the presence or absence of sorting.
10.9.5 How does the format of my data affect queries?
Answer: The format of the documents (plain ASCII text, HTML, or Microsoft Word) should make no difference to query speed. The documents are filtered to plain text at indexing time, not query time.
The cleanliness of the data makes a difference. Spell-checked and subedited text for publication tends to have a much smaller total vocabulary (and therefore size of the index table) than informal text such as email, which contains spelling errors and abbreviations. For a given index memory setting, the extra text takes up memory, creates more fragmented rows, and adversely affects query response time.
10.9.6 What is the difference between an indexed lookup and a functional lookup
Answer: The kernel can query the Oracle Text index with an indexed lookup and a functional lookup. In the indexed lookup, the first and most common case, the kernel asks the Oracle Text index for all rowids that satisfy a particular text search. These rowids are returned in batches.
In the functional lookup, the kernel passes individual rowids to the Oracle Text index and asks whether that particular rowid satisfies a certain text criterion. The functional lookup is most commonly used with a very selective structured clause, so that only a few rowids must be checked against the Oracle Text index. Here is an example of a search where a functional lookup is useful:
SELECT ID, SCORE(1), TEXT FROM MYTABLE
WHERE START_DATE = '21 Oct 1992' <- highly selective AND CONTAINS (TEXT, 'commonword') > 0 <- unselective
Functional invocation is also used for an Oracle Text query that is ordered by a structured column (for example date, price) and if the Oracle Text query contains unselective words.
10.9.7 What tables are involved in queries?
Answer: All queries look at the index token table. The table’s name has the form of DR$indexname$I
and contains the list of tokens (TOKEN_TEXT
column) and the information about the row and word positions where the token occurs (TOKEN_INFO
column).
The row information is stored as internal docid values that must be translated into external rowid values. The table that you use depends on the type of lookup:
-
For functional lookups, use the
$K
table,DR$indexname$K.
This simple Index Organized Table (IOT) contains a row for each docid/rowid pair. -
For indexed lookups, use the
$R
table,DR$indexname$R.
This table holds the complete list of rowids in a BLOB column.
Starting with Oracle Database 12c Release 2 (12.2), a new storage attribute, SMALL_R_ROW,
was introduced to reduce the size of the $R
row. It populates $R
rows on demand instead of creating 22 static rows, thereby reducing the Data Manipulation Language contention. The contention happens when parallel insert, update, and delete operations try to lock the same $R
row.
You can easily find out whether a functional or indexed lookup is being used by examining a SQL trace and looking for the $K
or $R
tables.
Note:
These internal index tables are subject to change from release to release. Oracle recommends that you do not directly access these tables in your application.
10.9.8 How is the $R table contention reduced?
$R
contention during base table delete and update operations has become a recurring theme over the past few years. Currently, each $R
index table has 22 static rows, and each row can contain up to 200 million rowids. The contention happens when the parallel insert, update, and delete operations try to lock the same $R
row for insert or delete operations. The following enhancements made during this release reduce the contention:
-
The maximum number of rowids that each
$R
row can contain is 70,000, which translates to 1 MB of data stored on each row. To use this feature, you must set theSMALL_R_ROW
storage attribute. -
The
$R
rows are created on demand instead of just populating a pre-determined number of rows.
10.9.9 Does sorting the results slow a text-only query?
Answer: Yes, it certainly does.
If Oracle Text does not sort, then it can return results as it finds them. This approach is quicker when the application needs to display only a page of results at a time.
10.9.10 How do I make an ORDER BY score query faster?
Answer: Sorting by relevance (SCORE(n)
) can be fast if you use the FIRST_ROWS(n)
hint. In this case, Oracle Text performs a high-speed internal sort when fetching from the Oracle Text index tables.
Here is an example of this query:
SELECT /*+ FIRST_ROWS(10) */ ID, SCORE(1), TEXT FROM mytable WHERE CONTAINS (TEXT, 'searchterm', 1) > 0 ORDER BY SCORE(1) DESC;
It is important to note that, there must be no other criteria in the WHERE
clause, other than a single CONTAINS.
10.9.11 Which memory settings affect querying?
Answer: For querying, you want to strive for a large system global area (SGA). You can set these SGA parameters in your Oracle Database initialization file. You can also set these parameters dynamically.
The SORT_AREA_SIZE
parameter controls the memory that is available for sorting ORDER BY
queries. You should increase the size of this parameter if you frequently order by structured columns.
See Also:
-
Oracle Database Administrator's Guide for more information on setting SGA related parameters
-
Oracle Database Performance Tuning Guide for more information on memory allocation
-
Oracle Database Reference for more information on setting the
SORT_AREA_SIZE
parameter
10.9.12 Does out-of-line LOB storage of wide base table columns improve performance?
Answer: Yes. Typically, a SELECT
statement selects more than one column from your base table. Because Oracle Text fetches columns to memory, it is more efficient to store wide base table columns such as large objects (LOBs) out of line, especially when these columns are rarely updated but frequently selected.
When LOBs are stored out of line, only the LOB locators need to be fetched to memory during querying. Out-of-line storage reduces the effective size of the base table. It makes it easier for Oracle Text to cache the entire table to memory, and so reduces the cost of selecting columns from the base table, and speeds up text queries.
In addition, smaller base tables cached in memory enables more index table data to be cached during querying, which improves performance.
10.9.13 How can I speed up a CONTAINS query on more than one column?
Answer: The fastest type of query is one where there is only a single CONTAINS
clause and no other conditions in the WHERE
clause.
Consider the following multiple CONTAINS
query:
SELECT title, isbn FROM booklist WHERE CONTAINS (title, 'horse') > 0 AND CONTAINS (abstract, 'racing') > 0
You can get the same result with section searching and the WITHIN
operator:
SELECT title, isbn FROM booklist WHERE CONTAINS (alltext, 'horse WITHIN title AND racing WITHIN abstract')>0
This query is completed more quickly than the single CONTAINS
clause. To use a query like this, you must copy all data into a single text column for indexing, with section tags around each column's data. You can do that with PL/SQL procedures before indexing, or you can use the USER_DATASTORE
datastore during indexing to synthesize structured columns with the text column into one document.
10.9.14 Can I have many expansions in a query?
Answer: Each distinct word used in a query requires at least one row to be fetched from the index table. It is therefore best to keep the number of expansions down as much as possible.
You should not use expansions such as wild cards, thesaurus, stemming, and fuzzy matching unless they are necessary to the task. In general, a few expansions (for example, 10 to 20) does not cause difficulty, but avoid a large number of expansions (80 or 100) in a query. Use the query feedback mechanism to determine the number of expansions for any particular query expression.
For wildcard and stem queries, you can avoid term expansion from query time to index time by creating prefix, substring, or stem indexes. Query performance increases at the cost of longer indexing time and added disk space.
Prefix and substring indexes can improve wildcard performance. You enable prefix and substring indexing with the BASIC_WORDLIST
preference. The following example sets the wordlist preference for prefix and substring indexing. For prefix indexing, it specifies that Oracle Text creates token prefixes between 3 and 4 characters long:
begin
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST'); ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE'); ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH', '3'); ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', '4'); ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
end
Enable stem indexing with the BASIC_LEXER
preference:
begin
ctx_ddl.create_preference('mylex', 'BASIC_LEXER'); ctx_ddl.set_attribute ( 'mylex', 'index_stems', 'ENGLISH');
end;
10.9.15 How can local partition indexes help?
Answer: You can create local partitioned CONTEXT
indexes on partitioned tables. This means that, on a partitioned table, each partition has its own set of index tables. Effectively, the results from the multiple indexes are combined as necessary to produce the final result set.
Use the LOCAL
keyword to create the index:
CREATE INDEX index_name ON table_name (column_name) INDEXTYPE IS ctxsys.context PARAMETERS ('...') LOCAL
With partitioned tables and local indexes, you can improve performance of the following types of CONTAINS
queries:
-
Range Search on Partition Key Column: This query restricts the search to a particular range of values on a column that is also the partition key.
-
ORDER BY Partition Key Column: This query requires only the first
n
hits, and theORDER BY
clause names the partition key.
10.9.16 Should I query in parallel?
Answer: It depends on system load and server capacity. Even though parallel querying is the default behavior for indexes created in parallel, it usually degrades the overall query throughput on heavily loaded systems.
Parallel queries are optimal for Decision Support System (DSS). They are also optimal for analytical systems that have large data collections, multiple CPUs with a low number of concurrent users, or Oracle Real Application Clusters (Oracle RAC) nodes.
Related Topics
10.9.17 Should I index themes?
Answer: Indexing theme information with a CONTEXT
index takes longer and also increases the size of your index. However, theme indexes enable ABOUT
queries to be more precise by using the knowledge base. If your application uses many ABOUT
queries, it might be worthwhile to create a theme component to the index, despite the extra indexing time and extra storage space required.
See Also:
10.9.18 When should I use a CTXCAT index?
Answer: CTXCAT
indexes work best when the text is in small chunks (just a few lines), and you want searches to restrict or sort the result set according to certain structured criteria, such as numbers or dates.
For example, consider an online auction site. Each item for sale has a short description, a current bid price, and start and end dates for the auction. You want to see all records with antique cabinet in the description, with a current bid price less than $500. Because you are particularly interested in newly posted items, you want the results sorted by auction start time.
This search is not always efficient with a CONTAINS
structured query on a CONTEXT
index. The response time can vary significantly depending on the structured and CONTAINS
clauses, because the intersection of structured and CONTAINS
clauses or the Oracle Text query ordering is computed during query time.
By including structured information within the CTXCAT
index, you ensure that the query response time is always in an optimal range regardless of search criteria. Because the interaction between text and structured query is precomputed during indexing, query response time is optimum.
Note:
The Oracle Text indextypeCTXCAT
is deprecated with Oracle Database 23ai. The indextype itself, and it's operator CTXCAT
, can be removed in a future release.Both CTXCAT
and the use of CTXCAT
grammar as an alternative grammar for CONTEXT
queries is deprecated. Instead, Oracle recommends that you use the CONTEXT
indextype, which can provide all the same functionality, except that it is not transactional. Near-transactional behavior in CONTEXT
can be achieved by using SYNC(ON COMMIT)
or, preferably, SYNC(EVERY [time-period])
with a short time period.
CTXCAT
was introduced when indexes were typically a few megabytes in size. Modern, large indexes, can be difficult to manage with CTXCAT
. The addition of index sets to CTXCAT
can be achieved more effectively by the use of FILTER BY
and ORDER BY
columns, or SDATA
, or both, in the CONTEXT
indextype. CTXCAT
is therefore rarely an appropriate choice. Oracle recommends that you choose the more efficient CONTEXT
indextype.
10.9.19 When is a CTXCAT index NOT suitable?
Answer: There are differences in the time and space needed to create the index. CTXCAT
indexes take a bit longer to create, and they use considerably more disk space than CONTEXT
indexes.
If you are tight on disk space, consider carefully whether CTXCAT
indexes are appropriate for you.
With query operators, you can use the richer CONTEXT
grammar in CATSEARCH
queries with query templates. The older restriction of a single CATSEARCH
query grammar no longer holds.
Note:
The Oracle Text indextypeCTXCAT
is deprecated with Oracle Database 23ai. The indextype itself, and it's operator CTXCAT
, can be removed in a future release.Both CTXCAT
and the use of CTXCAT
grammar as an alternative grammar for CONTEXT
queries is deprecated. Instead, Oracle recommends that you use the CONTEXT
indextype, which can provide all the same functionality, except that it is not transactional. Near-transactional behavior in CONTEXT
can be achieved by using SYNC(ON COMMIT)
or, preferably, SYNC(EVERY [time-period])
with a short time period.
CTXCAT
was introduced when indexes were typically a few megabytes in size. Modern, large indexes, can be difficult to manage with CTXCAT
. The addition of index sets to CTXCAT
can be achieved more effectively by the use of FILTER BY
and ORDER BY
columns, or SDATA
, or both, in the CONTEXT
indextype. CTXCAT
is therefore rarely an appropriate choice. Oracle recommends that you choose the more efficient CONTEXT
indextype.
10.9.20 What optimizer hints are available and what do they do?
Answer: To drive the query with a text or b-tree index, you can use the INDEX(table column)
optimizer hint in the usual way.
You can also use the NO_INDEX(table column)
hint to disable a specific index.
The FIRST_ROWS(n)
hint has a special meaning for text queries. Use it when you need the first n
hits to a query. When you use the DOMAIN_INDEX_SORT
hint in conjunction with ORDER BY SCORE(n) DESC,
you tell the Oracle optimizer to accept a sorted set from the Oracle Text index and to sort no farther.
See Also: