3.7 Language-Specific Features

You can enable the following language-specific features:

3.7.1 Theme Indexing

By default, themes are indexed in English and French, for which you can index document theme information. A document theme is a concept that is sufficiently developed in the document.

Search document themes with the ABOUT operator and retrieve document themes programatically with the CTX_DOC PL/SQL package.

Enable and disable theme indexing with the index_themes attribute of the BASIC_LEXER preference type.

You can also index theme information in other languages, provided that you loaded and compiled a knowledge base for the language.

See Also:

3.7.2 Base-Letter Conversion for Characters with Diacritical Marks

Some languages contain characters with diacritical marks, such as tildes, umlauts, and accents. When your indexing operation converts words containing diacritical marks to their base-letter form, queries do not have to contain diacritical marks to score matches.

For example, in a Spanish base-letter index, a query of energía matches energía and energia. However, if you disable base-letter indexing, a query of energía only matches energía.

Enable and disable base-letter indexing for your language with the base_letter attribute of the BASIC_LEXER preference type.

See Also:

Oracle Text Reference to learn more about the BASIC_LEXER

3.7.3 Alternate Spelling

Languages such as German, Danish, and Swedish contain words that have more than one accepted spelling. For example, in German, you can substitute ae for ä. The ae character pair is known as the alternate form.

By default, Oracle Text indexes words in their alternate forms for these languages. Query terms are also converted to their alternate forms. The result is that you can query these words with either spelling.

Enable and disable alternate spelling for your language with the alternate_spelling attribute in the BASIC_LEXER preference type.

See Also:

Oracle Text Reference to learn more about the BASIC_LEXER

3.7.4 Composite Words

You can create composite indexes for all the languages that are supported for AUTO_LEXER and BASIC_LEXER.

As a result, a query on a term returns words that contain the term as a subcomposite. For example, in German, a query on the term Bahnhof (train station) returns documents that contain Bahnhof or any word containing Bahnhof as a subcomposite, such as Hauptbahnhof, Nordbahnhof, or Ostbahnhof.

You can enable and disable composite indexes with the composite attribute of the AUTO_LEXER and BASIC_LEXER preferences. The default value for composite is YES (composite word indexing enabled).

When composite word indexing is disabled, words that are usually one entry in a dictionary are not split into composite stems. Words that are not dictionary entries are split into composite stems.

3.7.5 Korean, Japanese, and Chinese Indexing

This is a list of specific lexers that you can use to index Korean, Japanese, and Chinese languages.

Table 3-3 Lexers for Asian Languages

Language Lexer

Korean

AUTO_LEXER, KOREAN_MORPH_LEXER

Japanese

AUTO_LEXER, JAPANESE_LEXER, JAPANESE_VGRAM_LEXER

Chinese

AUTO_LEXER, CHINESE_LEXER,CHINESE_VGRAM_LEXER

These lexers have their own sets of attributes to control indexing.

Related Topics