3.9 About Fuzzy Matching and Stemming
Use the BASIC_WORDLIST
preference to enable query options, such as stemming and fuzzy matching for your language.
Overview
Fuzzy matching allows you to match words that have a similar spelling as the specified term. Oracle Text provides entity extraction for multiple languages.
Stemming enables indexing by the stem (same linguistic root as the specified $term
). For example, you can index words like speak
, speaks
, spoke
, and spoken
by the term speak
. The term speak
is interpreted as the stem of those words.
Fuzzy matching and stemming are automatically enabled in your index if Oracle Text supports this feature for your language.
Fuzzy Matching Attributes
Fuzzy matching (fuzzy_match
) is enabled with default parameters for its fuzzy score and maximum number of expanded terms. Fuzzy score (fuzzy_score
) is a measure of how closely the expanded word matches the query word. Fuzzy number results (fuzzy_numresults
) specify the maximum number of fuzzy expansions. At index time, you can change these default parameters.
Stemming Attributes
-
Language Attribute Values for
AUTO_LEXER
:To automatically detect the language of a document and to have the necessary transformations performed, create a stem index by enabling the
index_stems
attribute of theAUTO_LEXER
. Use the stemmer that corresponds to the document language and always configure the stemmer to maximize document recall.For compound words in languages (for example, in German, Finnish, Swedish, or Dutch), if you set
composite
toYES
(default value), then compound word stemming is automatically performed in documents. Compounds are always separated into their component stems. -
Language Attribute Values for
BASIC_LEXER
:To improve the performance of stem queries, create a stem index by enabling the
index_stems
attribute ofBASIC_LEXER
.Starting with Oracle Database 23ai, the old stemmer has been removed, making the
_NEW
suffix redundant. For example,ENGLISH_NEW
is equivalent toENGLISH
.For compound words in languages (for example, in German, Finnish, Swedish, or Dutch), if you set
composite
toYES
(default value), then compound word stemming is automatically performed in documents. Compounds are always separated into their component stems.