3.7 Language-Specific Features
You can enable the following language-specific features:
3.7.1 Theme Indexing
By default, themes are indexed in English and French, for which you can index document theme information. A document theme is a concept that is sufficiently developed in the document.
Search document themes with the ABOUT
operator and retrieve document themes programatically with the CTX_DOC
PL/SQL package.
Enable and disable theme indexing with the index_themes
attribute of the BASIC_LEXER
preference type.
You can also index theme information in other languages, provided that you loaded and compiled a knowledge base for the language.
See Also:
-
Oracle Text Reference to learn more about the
BASIC_LEXER
3.7.2 Base-Letter Conversion for Characters with Diacritical Marks
Some languages contain characters with diacritical marks, such as tildes, umlauts, and accents. When your indexing operation converts words containing diacritical marks to their base-letter form, queries do not have to contain diacritical marks to score matches.
For example, in a Spanish base-letter index, a query of energía matches energía and energia. However, if you disable base-letter indexing, a query of energía only matches energía.
Enable and disable base-letter indexing for your language with the base_letter
attribute of the BASIC_LEXER
preference type.
See Also:
Oracle Text Reference to learn more about the BASIC_LEXER
3.7.3 Alternate Spelling
Languages such as German, Danish, and Swedish contain words that have more than one accepted spelling. For example, in German, you can substitute ae for ä. The ae character pair is known as the alternate form.
By default, Oracle Text indexes words in their alternate forms for these languages. Query terms are also converted to their alternate forms. The result is that you can query these words with either spelling.
Enable and disable alternate spelling for your language with the alternate_spelling
attribute in the BASIC_LEXER
preference type.
See Also:
Oracle Text Reference to learn more about the BASIC_LEXER
3.7.4 Composite Words
You can create composite indexes for all the languages that are supported for AUTO_LEXER
and BASIC_LEXER
.
As a result, a query on a term returns words that contain the term as a subcomposite. For example, in German, a query on the term Bahnhof (train station) returns documents that contain Bahnhof or any word containing Bahnhof as a subcomposite, such as Hauptbahnhof, Nordbahnhof, or Ostbahnhof.
You can enable and disable composite indexes with the composite
attribute of the AUTO_LEXER
and BASIC_LEXER
preferences. The default value for composite
is YES
(composite word indexing enabled).
When composite word indexing is disabled, words that are usually one entry in a dictionary are not split into composite stems. Words that are not dictionary entries are split into composite stems.
3.7.5 Korean, Japanese, and Chinese Indexing
This is a list of specific lexers that you can use to index Korean, Japanese, and Chinese languages.
Table 3-3 Lexers for Asian Languages
Language | Lexer |
---|---|
Korean |
|
Japanese |
|
Chinese |
|
These lexers have their own sets of attributes to control indexing.
Related Topics