3.12 Stopwords and Stopthemes

A stopword is a word that you do not want indexed. Stopwords are typically low-information words in a given language, such as this and that in English.

By default, Oracle Text provides a stoplist for indexing a given language. Modify this list or create your own with the CTX_DDL package. Specify the stoplist in the parameter string of the CREATE INDEX statement.

A stoptheme is a word that is prevented from being theme-indexed or that is prevented from contributing to a theme. Add stopthemes with the CTX_DDL package.

  • Language detection and stoplists: At query time, the language of the query is inherited from the query template or from the session language (if no language is specified through the query template).

  • Multilanguage stoplists: You create multilanguage stoplists to hold language-specific stopwords. This stoplist is useful when you use MULTI_LEXER to index a table that contains documents in different languages, such as English, German, and Japanese. At index creation, the language column of each document is examined, and only the stopwords for that language are eliminated. At query time, the session language setting determines the active stopwords, just as it determines the active lexer with the multi-lexer.