6.2 Oracle Text Query Features

Oracle Text has various query features. You can use these query features in your query application.

6.2.1 Query Expressions

A query expression is everything in between the single quotes in the text_query argument of the CONTAINS or CATSEARCH operator. The contents of a query expression in a CONTAINS query differs from the contents of a CATSEARCH operator.

6.2.1.1 CONTAINS Operators

A CONTAINS query expression can contain query operators that enable logical, proximity, thesaural, fuzzy, and wildcard searching. Querying with stored expressions is also possible. Within the query expression, you can use grouping characters to alter operator precedence. This book refers to these operators as the CONTEXT grammar.

With CONTAINS, you can also use the ABOUT query to query document themes.

See Also:

"The CONTEXT Grammar"

6.2.1.2 CATSEARCH Operator

With the CATSEARCH operator, you specify your query expression with the text_query argument and your optional structured criteria with the structured_query argument. The text_query argument enables you to query words and phrases. You can use logical operations, such as logical and, or, and not. This book refers to these operators as the CTXCAT grammar.

If you want to use the much richer set of operators supported by the CONTEXT grammar, you can use the query template feature with CATSEARCH.

With structured_query argument, you specify your structured criteria. You can use the following SQL operations:

  • =

  • <=

  • >=

  • >

  • <

  • IN

  • BETWEEN

You can also use the ORDER BY clause to order your output.

See Also:

"The CTXCAT Grammar"

6.2.1.3 MATCHES Operator

Unlike CONTAINS and CATSEARCH, MATCHES does not take a query expression as input.

Instead, the MATCHES operator takes a document as input and finds all rows in a query (rule) table that match it. As such, you can use MATCHES to classify documents according to the rules they match.

See Also:

"Querying with MATCHES"

6.2.2 Case-Sensitive Searching

Oracle Text supports case-sensitivity for word and ABOUT queries.

Word queries are not case-insensitive by default. This means that a query on the term dog returns the rows in your text table that contain the word dog, but not Dog or DOG.

You can enable or disable case-sensitive searching with the MIXED_CASE attribute in your BASIC_LEXER index preference. With a case-sensitive index, your queries must be entered in exact case. For example, a query on Dog matches only documents with Dog. Documents with dog or DOG are not returned as hits.

To enable case-insensitive searching, set the MIXED_CASE attribute in your BASIC_LEXER index preference to NO.

Note:

If you enable case-sensitivity for word queries and you query a phrase containing stopwords and indexable words, then you must specify the correct case for the stopwords. For example, a query on the dog does not return text that contains The Dog, assuming that the is a stopword.

ABOUT queries give the best results when your query is formulated with proper case because the normalization of your query is based on the knowledge catalog. The knowledge catalog is case-sensitive. Attention to case is required, especially for words that have different meanings depending on case, such as turkey the bird and Turkey the country.

However, you do not have to enter your query in exact case to obtain relevant results from an ABOUT query. The system does its best to interpret your query. For example, if you enter a query of ORACLE and the system does not find this concept in the knowledge catalog, the system might use Oracle as a related concept for lookup.

6.2.3 Query Feedback

Feedback provides broader-term, narrower term, and related term information for a specified query with a CONTEXT index. You obtain this information programatically with the CTX_QUERY.HFEEDBACK procedure.

Broader term, narrower term, and related term information is useful for suggesting other query terms to the user in your query application.

The returned feedback information is obtained from the knowledge base and contains only those terms that are also in the index. This process increases the chances that terms returned from HFEEDBACK produce hits over the currently indexed document set.

See Also:

Oracle Text Reference for more information about using CTX_QUERY.HFEEDBACK

6.2.4 Query Explain Plan

Explain plan information provides a graphical representation of the parse tree for a CONTAINS query expression. You can obtain this information programatically with the CTX_QUERY.EXPLAIN procedure.

Explain plan information tells you how a query is expanded and parsed without having the system execute the query. Obtaining explain information is useful for knowing the expansion for a particular stem, wildcard, thesaurus, fuzzy, soundex, or ABOUT query. Parse trees also show the following information:

  • Order of execution

  • ABOUT query normalization

  • Query expression optimization

  • Stopword transformations

  • Breakdown of composite-word tokens for supported languages

    See Also:

    Oracle Text Reference for more information about using CTX_QUERY.EXPLAIN

6.2.5 Using a Thesaurus in Queries

Oracle Text enables you to define a thesaurus for your query application and process queries more intelligently.

Because users might not know which words represent a topic, you can define synonyms or narrower terms for likely query terms. You can use the thesaurus operators to expand your query to include thesaurus terms.

6.2.6 Document Section Searching

Section searching enables you to narrow text queries down to sections within documents.

You can implement section searching when your documents have internal structure, such as HTML and XML documents. For example, you can define a section for the <H1> tag that enables you to query within this section by using the WITHIN operator.

You can set the system to automatically create sections from XML documents.

You can also define attribute sections to search attribute text in XML documents.

Note:

Section searching is supported for only word queries with a CONTEXT index.

6.2.7 Using Query Templates

Query templates are an alternative to the existing query languages.

Rather than passing a query string to CONTAINS or CATSEARCH, you pass a structured document that contains the query string in a tagged element. Within this structured document, or query template, you can enable additional query features.

Note:

The Oracle Text indextype CTXCAT is deprecated with Oracle Database 23ai. The indextype itself, and it's operator CTXCAT, can be removed in a future release.

Both CTXCAT and the use of CTXCAT grammar as an alternative grammar for CONTEXT queries is deprecated. Instead, Oracle recommends that you use the CONTEXT indextype, which can provide all the same functionality, except that it is not transactional. Near-transactional behavior in CONTEXT can be achieved by using SYNC(ON COMMIT) or, preferably, SYNC(EVERY [time-period]) with a short time period.

CTXCAT was introduced when indexes were typically a few megabytes in size. Modern, large indexes, can be difficult to manage with CTXCAT. The addition of index sets to CTXCAT can be achieved more effectively by the use of FILTER BY and ORDER BY columns, or SDATA, or both, in the CONTEXT indextype. CTXCAT is therefore rarely an appropriate choice. Oracle recommends that you choose the more efficient CONTEXT indextype.

6.2.7.1 Query Rewrite

Query applications sometimes parse end-user queries, interpreting a query string in one or more ways by using different operator combinations. For example, if a user enters a query of kukui nut, your application enters the {kukui nut} and {kukui or nut} queries to increase recall.

The query rewrite feature enables you to submit a single query that expands the original query into the rewritten versions. The results are returned with no duplication.

You specify your rewrite sequences with the query template feature. The rewritten versions of the query are executed efficiently with a single call to CONTAINS or CATSEARCH.

The following template defines a query rewrite sequence. The query of {kukui nut} is rewritten as follows:

{kukui} {nut}

{kukui} ; {nut}

{kukui} AND {nut}

{kukui} ACCUM {nut}

The following is the query rewrite template for these transformations:

select id from docs where CONTAINS (text,
 '<query>
   <textquery lang="ENGLISH" grammar="CONTEXT"> kukui nut
     <progression>
       <seq><rewrite>transform((TOKENS, "{", "}", " "))</rewrite></seq>
       <seq><rewrite>transform((TOKENS, "{", "}", " ; "))</rewrite></seq>
       <seq><rewrite>transform((TOKENS, "{", "}", "AND"))</rewrite></seq>
       <seq><rewrite>transform((TOKENS, "{", "}", "ACCUM"))</rewrite></seq>
     </progression>
   </textquery>
  <score datatype="INTEGER" algorithm="COUNT"/>
</query>')>0;

6.2.7.2 Query Relaxation

The query relaxation feature enables your application to execute the most restrictive version of a query first and progressively relax the query until the required number of hits are obtained.

For example, your application searches first on green pen and then the query is relaxed to green NEAR pen to obtain more hits.

The following query template defines a query relaxation sequence. The query of green pen is entered in sequence.

{green} {pen}

{green} NEAR {pen}

{green} AND {pen}

{green} ACCUM {pen}

The following is the query relaxation template for these transformations:

select id from docs where CONTAINS (text,
 '<query>
   <textquery lang="ENGLISH" grammar="CONTEXT">
     <progression>
       <seq>{green} {pen}</seq>
       <seq>{green} NEAR {pen}</seq>
       <seq>{green} AND {pen}</seq>
       <seq>{green} ACCUM {pen}</seq>
     </progression>
   </textquery>
   <score datatype="INTEGER" algorithm="COUNT"/>
</query>')>0;

Query hits are returned in this sequence with no duplication as long as the application needs results.

Query relaxation is most effective when your application needs the top-N hits to a query, which you can obtain with the DOMAIN_INDEX_SORT hint or in a PL/SQL cursor.

Using query templating to relax a query is more efficient than reexecuting a query.

6.2.7.3 Query Language

When you use MULTI_LEXER to index a column containing documents in different languages, you can specify which language lexer to use during querying. You do so by using the lang parameter in the query template, which specifies the document-level lexer.

select id from docs where CONTAINS (text,
'<query><textquery lang="french">bon soir</textquery></query>')>0;

See Also:

Oracle Text Reference for information on LANGUAGE and lang with ALTER INDEX and document sublexer

6.2.7.4 Ordering by SDATA Sections

You can order the query results according to the content of SDATA sections by using the <order> and <orderkey> elements of the query template.

In the following example, the first level of ordering is performed on the SDATA price section, which is sorted in ascending order. The second and third level of ordering are performed by the SDATA pub_date section and score, both of which are sorted in descending order.

select id from docs where CONTAINS (text, '
<query>
   <textquery lang="ENGLISH" grammar="CONTEXT"> Oracle </textquery>
   <score datatype="INTEGER" algorithm="COUNT"/>
   <order>
       <orderkey> SDATA(price) ASC </orderkey>
       <orderkey> SDATA(pub_date) DESC </orderKey>
       <orderkey> Score DESC </orderkey>
   </order>
</query>', 1)>0;

Note:

  • You can add additional SDATA sections to an index. Refer to the ADD SDATA SECTION parameter string under ALTER INDEX in Oracle Text Reference.

  • Documents that were indexed before adding an SDATA section do not reflect this new preference. Rebuild the index in this case.

See Also:

Oracle Text Reference for syntax of <order> and <orderkey> elements of the query template

6.2.7.5 Alternative and User-Defined Scoring

You can use query templating to specify alternative scoring algorithms. Those algorithms help you customize how CONTAINS is scored. They also enable SDATA to be used as part of the scoring expressions. In this way, you can mathematically define the scoring expression by using not only predefined scoring components, but also SDATA components.

With alternative user-defined scoring, you can specify:

  • Scoring expressions of terms by defining arithmetic expressions that define how the query should be scored, using

    • predefined scoring algorithms: DISCRETE, OCCURRENCE, RELEVANCE, and COMPLETION

    • arithmetic operations: plus, minus, multiply, divide

    • arithmetic functions: ABS(n), finding the absolute value of n ; LOG(n), finding the base-10 logarithmic value of n

    • Numeric literals

  • Scoring expressions at the term level

  • Terms that should not be taken into account when calculating the score

  • How the score from child elements of OR and AND operators should be merged

  • Use

You can also use the SDATA that stores numeric or DATETIME values to affect the final score of the document.

The following example specifies an alternative scoring algorithm:

select id from docs where CONTAINS (text,
'<query>        
 <textquery grammar="CONTEXT" lang="english"> mustang  </textquery>     
 <score datatype="float" algorithm="DEFAULT"/>     
</query>')>0

The following query templating example includes SDATA values as part of the final score:

select id from docs where CONTAINS (text,
'<query>
<textquery grammar="CONTEXT" lang="english"> mustang </textquery>
<score datatype="float" algorithm="DEFAULT" normalization_expr ="doc_score+SDATA(price)"/>
</query>')>0"

6.2.7.6 Alternative Grammar

Query templating enables you to use the CONTEXT grammar with CATSEARCH queries and vice versa.

select id from docs where CONTAINS (text,
'<query> 
  <textquery grammar="CTXCAT">San Diego</textquery>
  <score datatype="integer"/>
</query>')>0;

6.2.8 Query Analysis

Oracle Text enables you to create a log of queries and to analyze the queries. For example, suppose you have an application that searches a database of large animals, and your analysis of its queries shows that users search for the word mouse. This analysis shows you that you should rewrite your application to avoid returning an unsuccessful search. Instead, a search for mouse redirects users to a database of small animals.

With query analysis, you can find out:

  • Which queries were made

  • Which queries were successful

  • Which queries were unsuccessful

  • How many times each query was made

You can combine these factors in various ways, such as determining the 50 most frequent unsuccessful queries made by your application.

You start query logging with CTX_OUTPUT.START_QUERY_LOG. The query log contains all queries made to all CONTEXT indexes that the program is using until a CTX_OUTPUT.END_QUERY_LOG procedure is entered. Use CTX_REPORT.QUERY_LOG_SUMMARY to get a report of queries.

See Also:

Oracle Text Reference for syntax and examples for these procedures

6.2.9 Other Query Features

In your query application, you can use other query features such as proximity searching. Table 6-1 lists some of these features.

Table 6-1 Other Oracle Text Query Features

Feature Description Implement With

Case-Sensitive Searching

Enables you to search on words or phrases exactly as they are entered in the query. For example, a search on Roman returns documents that contain Roman and not roman.

BASIC_LEXER when you create the index

Base-Letter Conversion

Queries words with or without diacritical marks such as tildes, accents, and umlauts. For example, with a Spanish base-letter index, a query of energía matches documents containing both energía and energia.

BASIC_LEXER when you create the index

Word Decompounding

(German and Dutch)

Enables searching on words that contain the specified term as subcomposite.

BASIC_LEXER when you create the index

Alternate Spelling

(German, Dutch, and Swedish)

Searches on alternate spellings of words.

BASIC_LEXER when you create the index

Proximity Searching

Searches for words near one another.

NEAR operator when you enter the query

Expanded operator containing the functionality of PHRASE, NEAR and AND operators.

Breaks a document into clumps based on the given query. Each clump is classified based on a primary feature, and is scored based on secondary features. The final document score adds clump scores such that the ordering of primary features determines the initial ordering of document scores.

NEAR2 operator when you enter the query

Stemming

Searches for words with the same root as the specified term.

$ operator at when you enter the query

Fuzzy Searching

Searches for words that have a similar spelling as the specified term.

FUZZY operator when you enter the query

Query Explain Plan

Generates query parse information.

CTX_QUERY.EXPLAIN PL/SQL procedure after you index

Hierarchical Query Feedback

Generates broader term, narrower term and related term information for a query.

CTX_QUERY.HFEEDBACK PL/SQL procedure after you index

Browse index

Browses the words around a seed word in the index.

CTX_QUERY.BROWSE_WORDS PL/SQL after you index

Count hits

Counts the number of hits in a query.

CTX_QUERY.COUNT_HITS PL/SQL procedure after you index

Stored Query Expression

Stores the text of a query expression for later reuse in another query.

CTX_QUERY.STORE_SQE PL/SQL procedure after you index

Thesaural Queries

Uses a thesaurus to expand queries.

Thesaurus operators such as SYN and BT as well as the ABOUT operator

(Use CTX_THES package to maintain the thesaurus.)