2.3 Query Application Quick Tour

In a basic text query application, users enter query words or phrases and expect the application to return a list of documents that best match the query. Such an application involves creating a CONTEXT index and querying it with CONTAINS.

Typically, query applications require a user interface. An example of how to build such a query application using the CONTEXT index type is given in CONTEXT Query Application.

The examples in this section provide the basic SQL statements to load the text table, index the documents, and query the index.

2.3.1 Creating the Text Table

Perform the following steps to create and load documents into a table.

  1. Connect as the new user.

    Before creating any tables, assume the identity of the user that you created.

    CONNECT myuser;
    
  2. Create your text table.

    The following example creates a table called docs with two columns, id and text, by using the CREATE TABLE statement. This example makes the id column the primary key. The text column is VARCHAR2.

    CREATE TABLE docs (id NUMBER PRIMARY KEY, text VARCHAR2(200));

    Note:

    Primary keys of the following type are supported: NUMBER, VARCHAR2, DATE, CHAR, VARCHAR, and RAW.

  3. Load documents into the table.

    Use the SQL INSERT statement to load text into a table.

    To populate the docs table, use the INSERT statement:

    INSERT INTO docs VALUES(1, '<HTML>California is a state in the US.</HTML>');
    INSERT INTO docs VALUES(2, '<HTML>Paris is a city in France.</HTML>');
    INSERT INTO docs VALUES(3, '<HTML>France is in Europe.</HTML>');

2.3.2 Using SQL*Loader to Load the Table

You can use SQL*Loader to load a table in batches.

Perform the following steps to load your table in batches with SQL*Loader:

  1. Create the CONTEXT index.

    Index the HTML files by creating a CONTEXT index on the text column as follows. Because you are indexing HTML, this example uses the NULL_FILTER preference type for no filtering and the HTML_SECTION_GROUP type. If you index PDF, Microsoft Word, or other formatted documents, then use the CTXSYS.AUTO_FILTER (the default) as your FILTER preference.

    CREATE INDEX idx_docs ON docs(text)
         INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS
         ('FILTER CTXSYS.NULL_FILTER SECTION GROUP CTXSYS.HTML_SECTION_GROUP');
    

    This example also uses the HTML_SECTION_GROUP section group, which is recommended for indexing HTML documents. Using HTML_SECTION_GROUP enables you to search within specific HTML tags and eliminate unwanted markup, such as font information, from the index.

  2. Query your table with CONTAINS.

    First, set the format of the SELECT statement's output so that it is easily readable. Set the width of the text column to 40 characters:

    COLUMN text FORMAT a40;
    

    Next, query the table with the SELECT statement with CONTAINS. This query retrieves the document IDs that satisfy the query. The following query looks for all documents that contain the word France:

    SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'France', 1) > 0;
    
      SCORE(1)         ID TEXT
    ---------- ---------- ----------------------------------------
             4          3 <HTML>France is in Europe.</HTML>
             4          2 <HTML>Paris is a city in France.</HTML>
  3. Present the document.

    In a real-world application, you could present the selected document with query terms highlighted. Oracle Text enables you to mark up documents with the CTX_DOC package.

    You can demonstrate HTML document markup with an anonymous PL/SQL block in SQL*Plus. However, in a real-world application, you could present the document in a browser.

    This PL/SQL example uses the in-memory version of CTX_DOC.MARKUP to highlight the word France in document 3. It allocates a temporary CLOB (character large object data type) to store the markup text and reads it back to the standard output. The CLOB is then deallocated before exiting:

    SET SERVEROUTPUT ON;
    DECLARE
      2  mklob CLOB;
      3  amt NUMBER := 40;
      4  line VARCHAR2(80);
      5  BEGIN
      6    CTX_DOC.MARKUP('idx_docs','3','France', mklob);
      7    DBMS_LOB.READ(mklob, amt, 1, line);
      8    DBMS_OUTPUT.PUT_LINE('FIRST 40 CHARS ARE:'||line);
      9    DBMS_LOB.FREETEMPORARY(mklob);
     10    END;
     11  /
    FIRST 40 CHARS ARE:<HTML><<<France>>> is in Europe.</HTML>
    
    PL/SQL procedure successfully completed.
  4. Synchronize the index after data manipulation.

    When you create a CONTEXT index, you explicitly synchronize your index to update it with any inserts, updates, or deletions to the text table.

    Oracle Text enables you to do so with the CTX_DDL.SYNC_INDEX procedure.

    Add some rows to the docs table:

    INSERT INTO docs VALUES(4, '<HTML>Los Angeles is a city in California.</HTML>');
    INSERT INTO docs VALUES(5, '<HTML>Mexico City is big.</HTML>');
    

    Because the index is not synchronized, these new rows are not returned with a query on city:

    SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'city', 1) > 0;
    
      SCORE(1)         ID TEXT
    ---------- ---------- --------------------------------------------------
             4          2 <HTML>Paris is a city in France.</HTML>
    

    Therefore, synchronize the index with 2 Mb of memory and rerun the query:

    EXEC CTX_DDL.SYNC_INDEX('idx_docs', '2M');
    
    PL/SQL procedure successfully completed.
    
    COLUMN text FORMAT a50;
    SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'city', 1) > 0;
    
      SCORE(1)         ID TEXT
    ---------- ---------- --------------------------------------------------
             4          5 <HTML>Mexico City is big.</HTML>
             4          4 <HTML>Los Angeles is a city in California.</HTML>
             4          2 <HTML>Paris is a city in France.</HTML>

    See Also:

    "Building the PSP Web Application" for an example of how to use SQL*Loader to load a text table from a data file