6.1 Overview of Queries
The basic Oracle Text query takes a query expression, usually a word with or without operators, as input. Oracle Text returns all documents (previously indexed) that satisfy the expression along with a relevance score for each document. You can use the scores to order the documents in the result set.
To enter an Oracle Text query, use the SQL SELECT
statement. Depending on the type of index, you use either the CONTAINS
or CATSEARCH
operator in the WHERE
clause. You can use these operators programatically wherever you can use the SELECT
statement, such as in PL/SQL cursors.
Use the MATCHES
operator to classify documents with a CTXRULE
index.
6.1.1 Querying with CONTAINS
When you create an index of type CONTEXT
, you must use the CONTAINS
operator to enter your query. This index is suitable for indexing collections of large coherent documents.
With the CONTAINS
operator, you can use a number of operators to define your search criteria. These operators enable you to enter logical, proximity, fuzzy, stemming, thesaurus, and wildcard searches. With a correctly configured index, you can also enter section searches on documents that have internal structure such as HTML and XML.
With CONTAINS,
you can also use the ABOUT
operator to search on document themes.
6.1.1.1 CONTAINS SQL Example
In the SELECT
statement, specify the query in the WHERE
clause with the CONTAINS
operator. Also specify the SCORE
operator to return the score of each hit in the hitlist. The following example shows how to enter a query:
SELECT SCORE(1), title from news WHERE CONTAINS(text, 'oracle', 1) > 0;
You can order the results from the highest scoring documents to the lowest scoring documents by using the ORDER
BY
clause as follows:
SELECT SCORE(1), title from news WHERE CONTAINS(text, 'oracle', 1) > 0 ORDER BY SCORE(1) DESC;
The CONTAINS
operator must always be followed by the > 0 syntax, which specifies that the score value returned by the CONTAINS
operator must be greater than zero for the row to be returned.
When the SCORE
operator is called in the SELECT
statement, the CONTAINS
operator must reference the score label value in the third parameter, as shown in the previous example.
6.1.1.2 CONTAINS PL/SQL Example
In a PL/SQL application, you can use a cursor to fetch the results of the query.
The following example enters a CONTAINS
query against the NEWS
table to find all articles that contain the word oracle. The titles and scores of the first ten hits are output.
declare rowno number := 0; begin for c1 in (SELECT SCORE(1) score, title FROM news WHERE CONTAINS(text, 'oracle', 1) > 0 ORDER BY SCORE(1) DESC) loop rowno := rowno + 1; dbms_output.put_line(c1.title||': '||c1.score); exit when rowno = 10; end loop; end;
This example uses a cursor FOR
loop to retrieve the first ten hits. An alias score is declared for the return value of the SCORE
operator. The score and title are shown as output by using the cursor dot notation.
6.1.1.3 Structured Query with CONTAINS Example
A structured query, also called a mixed query, is a query that has one CONTAINS
predicate to query a text column and another predicate to query a structured data column.
To enter a structured query, specify the structured clause in the WHERE
condition of the SELECT
statement.
For example, the following SELECT
statement returns all articles that contain the word oracle written on or after October 1, 1997:
SELECT SCORE(1), title, issue_date from news WHERE CONTAINS(text, 'oracle', 1) > 0 AND issue_date >= ('01-OCT-97') ORDER BY SCORE(1) DESC;
6.1.2 Querying with CATSEARCH
When you create an index of type CTXCAT
, you must use the CATSEARCH
operator to enter your query.
This index is suitable when your application stores short text fragments in the text column and associated information in related columns.
For example, an application serving an online auction site includes a table that stores item descriptions in a text column and date and price information in other columns. With a CTXCAT
index, you can create b-tree indexes on one or more columns, so that query performance is generally faster for mixed queries.
The operators available for CATSEARCH
queries are limited to logical operations such as AND
or OR.
To define your structured criteria, use the following operators : greater than, less than, equality, BETWEEN,
and IN.
Note:
The Oracle Text indextypeCTXCAT
is deprecated with Oracle Database 23ai. The indextype itself, and it's operator CTXCAT
, can be removed in a future release.Both CTXCAT
and the use of CTXCAT
grammar as an alternative grammar for CONTEXT
queries is deprecated. Instead, Oracle recommends that you use the CONTEXT
indextype, which can provide all the same functionality, except that it is not transactional. Near-transactional behavior in CONTEXT
can be achieved by using SYNC(ON COMMIT)
or, preferably, SYNC(EVERY [time-period])
with a short time period.
CTXCAT
was introduced when indexes were typically a few megabytes in size. Modern, large indexes, can be difficult to manage with CTXCAT
. The addition of index sets to CTXCAT
can be achieved more effectively by the use of FILTER BY
and ORDER BY
columns, or SDATA
, or both, in the CONTEXT
indextype. CTXCAT
is therefore rarely an appropriate choice. Oracle recommends that you choose the more efficient CONTEXT
indextype.
6.1.2.1 CATSEARCH SQL Query Example
A typical query with CATSEARCH
includes the following structured clause to find all rows that contain the word camera ordered by the bid_close
date:
SELECT FROM auction WHERE CATSEARCH(title, 'camera', 'order by bid_close desc')> 0;
The type of structured query tht you can enter depends on how you create your sub-indexes.
See Also:
As shown in the previous example, you specify the structured part of a CATSEARCH
query with the third structured_query
parameter. The columns in the structured expression must have a corresponding subindex.
For example, assuming that category_id
and bid_close
have a subindex in the ctxcat
index for the AUCTION
table, enter the following structured query:
SELECT FROM auction WHERE CATSEARCH(title, 'camera', 'category_id=99 order by bid_close desc')> 0;
6.1.2.2 CATSEARCH Example
The following example shows a field section search against a CTXCAT
index. It uses CONTEXT
grammar by means of a query template in a CATSEARCH
query.
-- Create and populate table create table BOOKS (ID number, INFO varchar2(200), PUBDATE DATE); insert into BOOKS values(1, '<author>NOAM CHOMSKY</author><subject>CIVIL RIGHTS</subject><language>ENGLISH</language><publisher>MIT PRESS</publisher>', '01-NOV-2003'); insert into BOOKS values(2, '<author>NICANOR PARRA</author><subject>POEMS AND ANTIPOEMS</subject><language>SPANISH</language> <publisher>VASQUEZ</publisher>', '01-JAN-2001'); insert into BOOKS values(1, '<author>LUC SANTE</author><subject>XML DATABASE</subject><language>FRENCH</language><publisher>FREE PRESS</publisher>', '15-MAY-2002'); commit;
-- Create index set and section group exec ctx_ddl.create_index_set('BOOK_INDEX_SET'); exec ctx_ddl.add_index('BOOK_INDEX_SET','PUBDATE'); exec ctx_ddl.create_section_group('BOOK_SECTION_GROUP', 'BASIC_SECTION_GROUP'); exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','AUTHOR','AUTHOR'); exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','SUBJECT','SUBJECT'); exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','LANGUAGE','LANGUAGE'); exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','PUBLISHER','PUBLISHER'); -- Create index create index books_index on books(info) indextype is ctxsys.ctxcat parameters('index set book_index_set section group book_section_group'); -- Use the index -- Note that: even though CTXCAT index can be created with field sections, it -- cannot be accessed using CTXCAT grammar (default for CATSEARCH). -- We need to use query template with CONTEXT grammar to access field -- sections with CATSEARCH select id, info from books where catsearch(info, '<query> <textquery grammar="context"> NOAM within author and english within language </textquery> </query>', 'order by pubdate')>0;
6.1.3 Querying with MATCHES
When you create an index of type CTXRULE,
you must use the MATCHES
operator to classify your documents. The CTXRULE
index is essentially an index on the set of queries that define your classifications.
For example, if you have an incoming stream of documents that need to be routed according to content, you can create a set of queries that define your categories. You create the queries as rows in a text column. You can create this type of table with the CTX_CLS.TRAIN
procedure.
You then index the table to create a CTXRULE
index. When documents arrive, you use the MATCHES
operator to classify each document
See Also:
6.1.3.1 MATCHES SQL Query
A MATCHES
query finds all rows in a query table that match a given document. Assuming that a querytable
table is associated with a CTXRULE
index, enter the following query:
SELECT classification FROM querytable WHERE MATCHES(query_string,:doc_text) > 0;
The :doc_text
bind variable contains the CLOB
document to be classified.
Here is a simple example:
create table queries ( query_id number, query_string varchar2(80) ); insert into queries values (1, 'oracle'); insert into queries values (2, 'larry or ellison'); insert into queries values (3, 'oracle and text'); insert into queries values (4, 'market share'); create index queryx on queries(query_string) indextype is ctxsys.ctxrule; select query_id from queries where matches(query_string, 'Oracle announced that its market share in databases increased over the last year.')>0
This query returns queries 1 (the word oracle appears in the document) and 4 (the phrase market share appears in the document), but not 2 (neither the word larry nor the word ellison appears, and not 3 (there is no text in the document, so it does not match the query).
In this example, the document was passed in as a string for simplicity. Your document is typically passed in a bind variable.
The document text used in a MATCHES
query can be VARCHAR2
or CLOB.
It does not accept BLOB
input, so you cannot match filtered documents directly. Instead, you must filter the binary content to CLOB
by using AUTO_FILTER.
The following example makes two assumptions:
-
The document data is in the
:doc_blob
bind variable. -
You have already defined
my_policy
thatCTX_DOC.POLICY_FILTER
can use.
For example:
declare doc_text clob; begin -- create a temporary CLOB to hold the document text doc_text := dbms_lob.createtemporary(doc_text, TRUE, DBMS_LOB.SESSION); -- create a simple policy for this example ctx_ddl.create_preference(preference_name => 'fast_filter', object_name => 'AUTO_FILTER'); ctx_ddl.set_attribute(preference_name => 'fast_filter', attribute_name => 'OUTPUT_FORMATTING', attribute_value => 'FALSE'); ctx_ddl.create_policy(policy_name => 'my_policy', filter => 'fast_filter); -- call ctx_doc.policy_filter to filter the BLOB to CLOB data ctx_doc.policy_filter('my_policy', :doc_blob, doc_text, FALSE); -- now do the matches query using the CLOB version for c1 in (select * from queries where matches(query_string, doc_text)>0) loop -- do what you need to do here end loop; dbms_lob.freetemporary(doc_text); end;
The CTX_DOC.POLICY_FILTER
procedure filters the BLOB
into the CLOB
data, because you must get the text into a CLOB
to enter a MATCHES
query. It takes, as one argument, the name of a policy that you already created with CTX_DDL.CREATE_POLICY
.
See Also:
Oracle Text Reference for information on CTX_DOC.POLICY_FILTER
If your file is text in the database character set, then you can create a BFILE
and load it to a CLOB
by using the DBMS_LOB.LOADFROMFILE
function, or you can use UTL_FILE
to read the file into a temp CLOB
locator.
If your file needs AUTO_FILTER
filtering, then you can load the file into a BLOB
instead and call CTX_DOC.POLICY_FILTER
, as previously shown.
See Also:
Classifying Documents in Oracle Text for more extended classification examples
6.1.3.2 MATCHES PL/SQL Examples
The following example assumes that the profiles
table of queries is associated with a CTXRULE
index. It also assumes that the newsfeed
table contains a set of news articles to be categorized.
This example loops through the newsfeed
table, categorizing each article by using the MATCHES
operator. The results are stored in the results
table.
PROMPT Populate the category table based on newsfeed articles PROMPT set serveroutput on; declare mypk number; mytitle varchar2(1000); myarticles clob; mycategory varchar2(100); cursor doccur is select pk,title,articles from newsfeed; cursor mycur is select category from profiles where matches(rule, myarticles)>0; cursor rescur is select category, pk, title from results order by category,pk; begin dbms_output.enable(1000000); open doccur; loop fetch doccur into mypk, mytitle, myarticles; exit when doccur%notfound; open mycur; loop fetch mycur into mycategory; exit when mycur%notfound; insert into results values(mycategory, mypk, mytitle); end loop; close mycur; commit; end loop; close doccur; commit; end;
The following example displays the categorized articles by category.
PROMPT display the list of articles for every category PROMPT set serveroutput on; declare mypk number; mytitle varchar2(1000); mycategory varchar2(100); cursor catcur is select category from profiles order by category; cursor rescur is select pk, title from results where category=mycategory order by pk; begin dbms_output.enable(1000000); open catcur; loop fetch catcur into mycategory; exit when catcur%notfound; dbms_output.put_line('********** CATEGORY: '||mycategory||' *************'); open rescur; loop fetch rescur into mypk, mytitle; exit when rescur%notfound; dbms_output.put_line('** ('||mypk||'). '||mytitle); end loop; close rescur; dbms_output.put_line('**'); dbms_output.put_line('*******************************************************'); end loop; close catcur; end;
See Also:
Classifying Documents in Oracle Text for more extended classification examples
6.1.4 Word and Phrase Queries
A word query is a query on a word or phrase. For example, to find all the rows in your text table that contain the word dog, enter a query specifying dog as your query term.
You can enter word queries with both CONTAINS
and CATSEARCH
SQL operators. However, phrase queries are interpreted differently.
-
CONTAINS Phrase Queries: If multiple words are contained in a query expression, separated only by blank spaces (no operators), the string of words is considered a phrase. Oracle Text searches for the entire string during a query. For example, to find all documents that contain the phrase international law, enter your query with the phrase international law.
-
CATSEARCH Phrase Queries: With the
CATSEARCH
operator, you insert theAND
operator between words in phrases. For example, a query such as international law is interpreted as international AND law.
6.1.5 Querying Stopwords
Stopwords are words for which Oracle Text does not create an index entry. They are usually common words in your language that are unlikely to be searched.
Oracle Text includes a default list of stopwords for your language. This list is called a stoplist. For example, in English, the words this and that are defined as stopwords in the default stoplist. You can modify the default stoplist or create new stoplists with the CTX_DDL
package. You can also add stopwords after indexing with the ALTER INDEX
statement.
You cannot query on a stopword itself or on a phrase composed of only stopwords. For example, a query on the word this returns no hits when this is defined as a stopword.
Because the Oracle Text index records the position of stopwords even though it does not create an index entry for them, you can query phrases that contain stopwords as well as indexable words, such as this boy talks to that girl.
When you include a stopword within your query phrase, the stopword matches any word. For example, the following query assumes that was is a stopword. It matches phrases such as Jack is big and Jack grew big. It also matches grew, even though it is not a stopword.
'Jack was big'
Starting with Oracle Database 12c Release 2 (12.2), stopwords and unary operators on stopwords are ignored at the initial stages of a query result in different query results than earlier releases. For example, the following query does not return documents because the
is a stopword and the $
operator and the stopword are ignored during query processing:
SQL> select count(1) from tabx where contains(text,'$the')>0;
.
COUNT(1)
----------
0
The next query returns documents containing first
because the the
stopword and the $
operator are ignored.
SQL> select count(1) from tabx where contains(text,'first and $the')>0;
.
COUNT(1)
----------
2
6.1.6 ABOUT Queries and Themes
An ABOUT
query is a query on a document theme. A document theme is a concept that is sufficiently developed in the text. For example, an ABOUT
query on US politics might return documents containing information about US presidential elections and US foreign policy. Documents need not contain the exact phrase US politics to be returned.
During indexing, document themes are derived from the knowledge base, which is a hierarchical list of categories and concepts that represents a view of the world. Some examples of themes in the knowledge catalog are concrete concepts such as jazz music, football, or Nelson Mandela. Themes can also be abstract concepts such as happiness or honesty.
During indexing, the system can also identify and index document themes that are sufficiently developed in the document but that do not exist in the knowledge base.
You can augment the knowledge base to define concepts and terms specific to your industry or query application. When you do so, ABOUT
queries are more precise for the added concepts.
ABOUT
queries perform best when you create a theme component in your index. Theme components are created by default for English and French.
See Also:
Querying Stopthemes
Oracle Text enables you to query on themes with the ABOUT
operator. A stoptheme is a theme that is not to be indexed. You can add and remove stopthemes with the CTX_DDL
package. You can add stopthemes after indexing with the ALTER INDEX
statement.