10.1 Optimizing Queries with Statistics

Query optimization with statistics uses the collected statistics on the tables and indexes in a query to select an execution plan that can process the query in the most efficient manner. As a general rule, Oracle recommends that you collect statistics on your base table if you are interested in improving your query performance. Optimizing with statistics enables a more accurate estimation of the selectivity and costs of the CONTAINS predicate and thus a better execution plan.

The optimizer attempts to choose the best execution plan based on the following parameters:

  • The selectivity on the CONTAINS predicate

  • The selectivity of other predicates in the query

  • The CPU and I/O costs of processing the CONTAINS predicates

The following topics discuss how to use statistics with the extensible query optimizer:

Note:

Importing and exporting statistics on domain indexes, including Oracle Text indexes, is not supported with the DBMS_STATS package. For more information on importing and exporting statistics, see the Oracle Database PL/SQL Packages and Types Reference.

See Also:

Oracle Text Reference for information on the CONTAINS query operator

10.1.1 Collecting Statistics

By default, Oracle Text uses the cost-based optimizer (CBO) to determine the best execution plan for a query.

To enable the optimizer to better estimate costs, calculate the statistics on the table you queried table:

ANALYZE TABLE <table_name> COMPUTE STATISTICS;

Alternatively, estimate the statistics on a sample of the table:

ANALYZE TABLE <table_name> ESTIMATE STATISTICS 1000 ROWS;

or

ANALYZE TABLE <table_name> ESTIMATE STATISTICS 50 PERCENT;

You can also collect statistics in parallel with the DBMS_STATS.GATHER_TABLE_STATS procedure:

begin
 
DBMS_STATS.GATHER_TABLE_STATS('owner', 'table_name',
                                       estimate_percent=>50,
                                       block_sample=>TRUE,
                                       degree=>4) ;
 
end  ;

These statements collect statistics on all objects associated with table_name, including the table columns and any indexes (b-tree, bitmap, or Text domain) associated with the table.

To re-collect the statistics on a table, enter the ANALYZE statement as many times as necessary or use the DBMS_STATS package.

By collecting statistics on the Text domain index, the CBO in Oracle Database can perform the following tasks:

  • Estimate the selectivity of the CONTAINS predicate

  • Estimate the I/O and CPU costs of using the Oracle Text index (that is, the cost of processing the CONTAINS predicate by using the domain index)

  • Estimate the I/O and CPU costs of each invocation of CONTAINS

Knowing the selectivity of a CONTAINS predicate is useful for queries that contain more than one predicate, such as in structured queries. This way the CBO can better decide whether to use the domain index to evaluate CONTAINS or to apply the CONTAINS predicate as a post filter.

See Also:

10.1.2 Query Optimization with Statistics Example

The following structured query provides an example for optimizing statistics:

select score(1) from tab where contains(txt, 'freedom', 1)  > 0 and author = 'King' and year > 1960;

Assume the following:

  • The author column is of type VARCHAR2 and the year column is of type NUMBER.

  • A b-tree index on the author column.

  • The structured author predicate is highly selective with respect to the CONTAINS predicate and the year predicate. That is, the structured predicate (author = 'King') returns a much smaller number of rows with respect to the year and CONTAINS predicates individually, say 5 rows returned versus 1000 and 1500 rows, respectively.

In this situation, Oracle Text can execute this query more efficiently by first scanning a b-tree index range on the structured predicate (author = 'King'), then accessing a table by rowid, and then applying the other two predicates to the rows returned from the b-tree table access.

Note:

When statistics are not collected for a Oracle Text index, the CBO assumes low selectivity and index costs for the CONTAINS predicate.

10.1.3 Re-Collecting Statistics

After synchronizing your index, you can re-collect statistics on a single index to update the cost estimates.

If your base table was reanalyzed before the synchronization, it is sufficient to analyze the index after the synchronization without reanalyzing the entire table.

To re-collect statistics, enter one of the following statements:

ANALYZE INDEX <index_name> COMPUTE STATISTICS;
ANALYZE INDEX <index_name> ESTIMATE STATISTICS SAMPLE 50 PERCENT;

10.1.4 Deleting Statistics

Delete the statistics associated with a table:

ANALYZE TABLE <table_name> DELETE STATISTICS;

Delete statistics on one index:

ANALYZE INDEX <index_name> DELETE STATISTICS;