3.9 About Fuzzy Matching and Stemming

Use the BASIC_WORDLIST preference to enable query options, such as stemming and fuzzy matching for your language.

Overview

Fuzzy matching allows you to match words that have a similar spelling as the specified term. Oracle Text provides entity extraction for multiple languages.

Stemming enables indexing by the stem (same linguistic root as the specified $term). For example, you can index words like speak, speaks, spoke, and spoken by the term speak. The term speak is interpreted as the stem of those words.

Fuzzy matching and stemming are automatically enabled in your index if Oracle Text supports this feature for your language.

Fuzzy Matching Attributes

Fuzzy matching (fuzzy_match) is enabled with default parameters for its fuzzy score and maximum number of expanded terms. Fuzzy score (fuzzy_score) is a measure of how closely the expanded word matches the query word. Fuzzy number results (fuzzy_numresults) specify the maximum number of fuzzy expansions. At index time, you can change these default parameters.

Stemming Attributes

  • Language Attribute Values for AUTO_LEXER:

    To automatically detect the language of a document and to have the necessary transformations performed, create a stem index by enabling the index_stems attribute of the AUTO_LEXER. Use the stemmer that corresponds to the document language and always configure the stemmer to maximize document recall.

    For compound words in languages (for example, in German, Finnish, Swedish, or Dutch), if you set composite to YES (default value), then compound word stemming is automatically performed in documents. Compounds are always separated into their component stems.

  • Language Attribute Values for BASIC_LEXER:

    To improve the performance of stem queries, create a stem index by enabling the index_stems attribute of BASIC_LEXER.

    Starting with Oracle Database 23ai, the old stemmer has been removed, making the _NEW suffix redundant. For example, ENGLISH_NEW is equivalent to ENGLISH.

    For compound words in languages (for example, in German, Finnish, Swedish, or Dutch), if you set composite to YES (default value), then compound word stemming is automatically performed in documents. Compounds are always separated into their component stems.