4.7 Creating a CTXRULE Index

To build a document classification application, use the CTXRULE index on a table or queries. The stream of incoming documents is classified by content, and the queries define your categories. You can use the MATCHES operator to classify single documents.

To create a CTXRULE index and a simple document classification application:

  1. Create a table of queries.

    Create a myqueries table to hold the category name and query text, and then populate the table with the classifications and the queries that define each classification.

    CREATE TABLE myqueries (
    queryid NUMBER PRIMARY KEY,
    category VARCHAR2(30),
    query VARCHAR2(2000)
    );
    

    For example, consider a classification for the US Politics, Music, and Soccer subjects:

    INSERT INTO myqueries VALUES(1, 'US Politics', 'democrat or republican');
    INSERT INTO myqueries VALUES(2, 'Music', 'ABOUT(music)');
    INSERT INTO myqueries VALUES(3, 'Soccer', 'ABOUT(soccer)');

    Tip:

    You can also generate a table of rules (or queries) with the CTX_CLS.TRAIN procedure, which takes as input a document training set.

  2. Create the CTXRULE index.

    Use the CREATE INDEX statement to create the CTXRULE index and specify lexer, storage, section group, and wordlist parameters if needed.

    CREATE INDEX myruleindex ON myqueries(query)
         INDEXTYPE IS CTXRULE PARAMETERS
               ('lexer lexer_pref 
                 storage storage_pref 
                 section group section_pref 
                 wordlist wordlist_pref');
  3. Classify a document.

    Use the MATCHES operator to classify a document.

    Assume that incoming documents are stored in the table news:

    CREATE TABLE news ( 
    newsid NUMBER,
    author VARCHAR2(30),
    source VARCHAR2(30),
    article CLOB);
    

    If you want, create a "before insert" trigger with MATCHES to route each document to a news_route table based on its classification:

    BEGIN
      -- find matching queries
      FOR c1 IN (select category
                   from myqueries
                  where MATCHES(query, :new.article)>0) 
      LOOP
        INSERT INTO news_route(newsid, category)
          VALUES (:new.newsid, c1.category);
      END LOOP;
    END;

See Also: