4.7 Creating a CTXRULE Index
To build a document classification application, use the CTXRULE
index on a table or queries. The stream of incoming documents is classified by content, and the queries define your categories. You can use the MATCHES
operator to classify single documents.
To create a CTXRULE
index and a simple document classification application:
-
Create a table of queries.
Create a
myqueries
table to hold the category name and query text, and then populate the table with the classifications and the queries that define each classification.CREATE TABLE myqueries ( queryid NUMBER PRIMARY KEY, category VARCHAR2(30), query VARCHAR2(2000) );
For example, consider a classification for the US Politics, Music, and Soccer subjects:
INSERT INTO myqueries VALUES(1, 'US Politics', 'democrat or republican'); INSERT INTO myqueries VALUES(2, 'Music', 'ABOUT(music)'); INSERT INTO myqueries VALUES(3, 'Soccer', 'ABOUT(soccer)');
Tip:
You can also generate a table of rules (or queries) with the
CTX_CLS.TRAIN
procedure, which takes as input a document training set. -
Create the
CTXRULE
index.Use the
CREATE INDEX
statement to create theCTXRULE
index and specify lexer, storage, section group, and wordlist parameters if needed.CREATE INDEX myruleindex ON myqueries(query) INDEXTYPE IS CTXRULE PARAMETERS ('lexer lexer_pref storage storage_pref section group section_pref wordlist wordlist_pref');
-
Classify a document.
Use the
MATCHES
operator to classify a document.Assume that incoming documents are stored in the table
news
:CREATE TABLE news ( newsid NUMBER, author VARCHAR2(30), source VARCHAR2(30), article CLOB);
If you want, create a "before insert" trigger with
MATCHES
to route each document to anews_route
table based on its classification:BEGIN -- find matching queries FOR c1 IN (select category from myqueries where MATCHES(query, :new.article)>0) LOOP INSERT INTO news_route(newsid, category) VALUES (:new.newsid, c1.category); END LOOP; END;
See Also:
-
Classifying Documents in Oracle Text for more information on document classification and the
CTXRULE
index -
Oracle Text Reference for more information on
CTX_CLS.TRAIN