14.1 Overview of Oracle Text Thesaurus Features

Users of your query application looking for information on a given topic might not know which words have been used in documents that refer to that topic.

Oracle Text enables you to create case-sensitive or case-insensitive thesauruses that define synonym and hierarchical relationships between words and phrases. You can then retrieve documents that contain relevant text by expanding queries to include similar or related terms as defined in the thesaurus.

You can create a thesaurus and load it into the system.

This section contains the following topics.

Note:

Oracle Text thesaurus formats and functionality are compliant with both the ISO-2788 and ANSI Z39.19 (1993) standards.

14.1.1 Oracle Text Thesaurus Creation and Maintenance

If you have the CTXAPP role, you can create, modify, delete, import, and export thesauruses and thesaurus entries.

This section contains the following topics.

  • CTX_THES Package: To maintain and browse your thesaurus programatically, you can use the CTX_THES PL/SQL package. With this package, you can browse terms and hierarchical relationships, add and delete terms, add and remove thesaurus relations, and import and export thesauruses in and out of the thesaurus tables.

  • Thesaurus Operators: To expand query terms according to your loaded thesaurus, you can use the thesaurus operators in the CONTAINS clause. For example, use the SYN operator to expand a term such as dog to its synonyms:

    'syn(dog)'

  • ctxload Utility: You can use the ctxload utility to load thesauruses from a plain-text file into the thesaurus tables, and to dump thesauruses from the tables into output (or dump) files.

    You can print the thesaurus dump files, you can use them as input for other applications, and you can use them to load a thesaurus into the thesaurus tables (useful when you want to use an existing thesaurus as the basis for a new thesaurus).

    WARNING:

    To ensure sound security practices, Oracle recommends that you enter the password for ctxload by using the interactive mode, which prompts you for the user password. Oracle strongly recommends that you do not enter a password on the command line.

    Note:

    You can also programatically import and export thesauruses in and out of the thesaurus tables using the PL/SQL package CTX_THES procedures IMPORT_THESAURUS and EXPORT_THESAURUS.

    Refer to Oracle Text Reference for more information about these procedures.

14.1.2 Using a Case-Sensitive Thesaurus

In a case-sensitive thesaurus, terms (words and phrases) are stored exactly as you enter them. For example, if you enter a term in mixed case (using either the CTX_THES package or a thesaurus load file), then the thesaurus stores the entry in mixed case.

Note:

To take full advantage of query expansions that result from a case-sensitive thesaurus, your index must also be case-sensitive.

When loading a thesaurus, you can specify a case-sensitive thesaurus by using the -thescase parameter.

When creating a thesaurus with either CTX_THES.CREATE_THESAURUS or CTX_THES.IMPORT_THESAURUS, you can specify a case-sensitive thesaurus.

In addition, when you specify a case-sensitive thesaurus in a query, the thesaurus lookup uses the query terms exactly as you enter them in the query. Therefore, queries that use case-sensitive thesauruses allow for a higher level of precision in the query expansion, which helps lookup when and only when you have a case-sensitive index.

For example, a case-sensitive thesaurus is created with different entries for the distinct meanings of the terms Turkey (the country) and turkey (the type of bird). Using the thesaurus, a query for Turkey expands to include only the entries associated with Turkey.

14.1.3 Using a Case-Insensitive Thesaurus

In a case-insensitive thesaurus, terms are stored in all uppercase, regardless of the case in which they were originally entered.

The ctxload program loads a thesaurus in case-insensitive mode by default.

When creating a thesaurus with either CTX_THES.CREATE_THESAURUS or CTX_THES.IMPORT_THESAURUS, the thesaurus is created as case-insensitive by default.

In addition, when you specify a case-insensitive thesaurus in a query, the query terms are converted to all uppercase for thesaurus lookup. As a result, Oracle Text is unable to distinguish between terms that have different meanings when they are in mixed case.

For example, a case-insensitive thesaurus is created with different entries for the two distinct meanings of the term TURKEY (the country or the type of bird). Using the thesaurus, a query for either Turkey or turkey is converted to TURKEY for thesaurus lookup and then expanded to include all the entries associated with both meanings.

14.1.4 Default Thesaurus

If you do not specify a thesaurus by name in a query, by default, the thesaurus operators use a thesaurus named DEFAULT. However, Oracle Text does not provide a DEFAULT thesaurus.

As a result, if you want to use a default thesaurus for the thesaurus operators, you must create a thesaurus named DEFAULT. You can create the thesaurus through any of the thesaurus creation methods supported by Oracle Text:

  • CTX_THES.CREATE_THESAURUS (PL/SQL)

  • CTX_THES.IMPORT_THESAURUS (PL/SQL)

  • ctxload utility

    See Also:

    Oracle Text Reference to learn more about using ctxload and the CTX_THES package

14.1.5 Supplied Thesaurus

Although Oracle Text does not provide a default thesaurus, Oracle Text does supply a thesaurus, in the form of a file that you load with ctxload, you can use to create a general-purpose, English-language thesaurus.

You can use the thesaurus load file to create a default thesaurus for Oracle Text, or you can use it as the basis for thesauruses tailored to a specific subject or range of subjects.

  • Supplied Thesaurus Structure and Content: The supplied thesaurus is similar to a traditional thesaurus, such as Roget's Thesaurus, in that it provides a list of synonymous and semantically related terms.

    It provides additional value by organizing the terms into a hierarchy that defines real-world, practical relationships between narrower terms and their broader terms.

    Additionally, cross-references are established between terms in different areas of the hierarchy.

  • Supplied Thesaurus Location: The exact name and location of the thesaurus load file depends on the operating system; however, the file is generally named dr0thsus (with an appropriate extension for text files) and is generally located in the following directory structure:

    <Oracle_home_directory>
        <Oracle_Text_directory>
           sample
               thes

See Also: