14.10 About the Supplied Knowledge Base

Oracle Text supplies a knowledge base for English and French. The supplied knowledge contains the information used to perform theme analysis. Theme analysis includes theme indexing, ABOUT queries, and theme extraction with the CTX_DOC package.

The knowledge base is a hierarchical tree of concepts and categories. It has six main branches:

  • Science and technology

  • Business and economics

  • Government and military

  • Social environment

  • Geography

  • Abstract ideas and concepts

The supplied knowledge base is like a thesaurus in that it is hierarchical and contains broader terms, narrower terms, and related terms. As such, to improve the accuracy of theme analysis, augment the knowledge base with your industry-specific thesaurus by linking new terms to existing terms.

You can also extend theme functionality to other languages by compiling a language-specific thesaurus into a knowledge base.

Knowledge bases can be in any single-byte character set. Supplied knowledge bases are in WE8ISO8859P1. You can store an extended knowledge base in another character set such as US7ASCII.

This section contains the following topics:

14.10.1 Adding a Language-Specific Knowledge Base

You can extend theme functionality to languages other than English or French by loading your own knowledge base for any single-byte whitespace-delimited language, including Spanish.

You can extend theme functionality to languages other than English or French by loading your own knowledge base for any single-byte whitespace-delimited language, including Spanish.

Theme functionality includes theme indexing, ABOUT queries, theme highlighting, and the generation of themes, gists, and theme summaries with CTX_DOC.

You extend theme functionality by adding a user-defined knowledge base. For example, you can create a Spanish knowledge base from a Spanish thesaurus.

To load your language-specific knowledge base:

  1. Load your custom thesaurus by using ctxload.
  2. Set NLS_LANG so that the language portion is the target language. The charset portion must be a single-byte character set.
  3. Compile the loaded thesaurus by using ctxkbtc and then enter the password for -user when you are prompted. This statement compiles your language-specific knowledge base from the loaded thesaurus.
    ctxkbtc -user ctxsys -name my_lang_thes

To use this knowledge base for theme analysis during indexing and ABOUT queries, specify the NLS_LANG language as the THEME_LANGUAGE attribute value for the BASIC_LEXER preference.

14.10.2 Limitations for Adding Knowledge Bases

Here are the limitations for adding knowledge bases:

  • Oracle supplies knowledge bases only in English and French. You must provide your own thesaurus for any other language.

  • You can add knowledge bases only for languages with single-byte character sets. You cannot create a knowledge base for languages that can be expressed only in multibyte character sets. If the database is a multibyte universal character set, such as UTF-8, you must still set the NLS_LANG parameter to a compatible single-byte character set when you compile the thesaurus.

  • Adding a knowledge base works best for whitespace-delimited languages.

  • Only one knowledge base is allowed for each NLS_LANG language.

  • Obtaining hierarchical query feedback information (for example, broader terms, narrower terms, and related terms) does not work in languages other than English and French. In other languages, the knowledge bases are derived entirely from your thesauruses. In such cases, Oracle recommends that you obtain hierarchical information directly from your thesauruses.

    See Also:

    Oracle Text Reference for more information about theme indexing, ABOUT queries, using the CTX_DOC package, and the supplied English knowledge base