1.2 Document Collection Applications

A text query application enables users to search document collections, such as websites, digital libraries, or document warehouses.

This section contains the following topics.

1.2.1 About Document Collection Applications

The collection is typically static and has no significant change in content after the initial indexing run. Documents can be any size and format, such as HTML, PDF, or Microsoft Word. These documents are stored in a document table. Searching is enabled by first indexing the document collection.

Queries usually consist of words or phrases. Application users specify logical combinations of words and phrases by using operators such as OR and AND. Users can apply other query operations to improve the search results, such as stemming, proximity searching, and wildcarding.

For this type of application, you should retrieve documents that are most relevant to a query. The documents must rank high in the result list.

The queries are best served with a CONTEXT index on your document table. To query this index, the application uses the SQL CONTAINS operator in the WHERE clause of a SELECT statement.

Figure 1-1 Overview of Text Query Application

Description of Figure 1-1 follows
Description of "Figure 1-1 Overview of Text Query Application"

1.2.2 Flowchart of Text Query Application

A typical text query application on a document collection lets the user enter a query. The application enters a CONTAINS query and returns a list, called a hitlist, of documents that satisfy the query. The results are usually ranked by relevance. The application enables the user to view one or more documents in the hitlist.

For example, an application might index URLs (HTML files) on the web and provide query capabilities across the set of indexed URLs. Hitlists returned by the query application are composed of URLs that the user can visit.

Figure 1-2 illustrates the flowchart of user interaction with a simple text query application:

  1. The user enters a query.

  2. The application runs a CONTAINS query.

  3. The application presents a hitlist.

  4. The user selects document from the hitlist.

  5. The application presents a document to the user for viewing.

Figure 1-2 Flowchart of a Text Query Application

Description of Figure 1-2 follows
Description of "Figure 1-2 Flowchart of a Text Query Application"