CREATE_LANG_DATA

Use the DBMS_VECTOR_CHAIN.CREATE_LANG_DATA chunker helper procedure to load your own language data file into the database.

Purpose

To create custom language data for your chosen language (specified using the language chunking parameter).

A language data file contains language-specific abbreviation tokens. You can supply this data to the chunker to help in accurately determining sentence boundaries of chunks, by using knowledge of the input language's end-of-sentence (EOS) punctuations, abbreviations, and contextual rules.

Usage Notes

  • All supported languages are distributed with the default language-specific abbreviation dictionaries. You can create a language data based on the abbreviation tokens loaded in the schema.table.column, using a user-specified language data name (PREFERENCE_NAME).

  • After loading your language data, you can use language-specific chunking by specifying the language chunking parameter with VECTOR_CHUNKS or UTL_TO_CHUNKS.

  • You can query these data dictionary views to access existing language data:
    • ALL_VECTOR_LANG displays all available languages data.

    • USER_VECTOR_LANG displays languages data from the schema of the current user.

    • ALL_VECTOR_ABBREV_TOKENS displays abbreviation tokens from all available language data.

    • USER_VECTOR_ABBREV_TOKENS displays abbreviation tokens from the language data owned by the current user.

Syntax

DBMS_VECTOR_CHAIN.CREATE_LANG_DATA (
    PARAMS       IN JSON default NULL
);

PARAMS

Specify the input parameters in JSON format:
{
    table_name, 
    column_name, 
    language,
    preference_name
}

Table 12-19 Parameter Details

Parameter Description Required Default Value

table_name

Name of the table (along with the optional table owner) in which you want to load the language data

Yes

No value

column_name

Column name in the language data table in which you want to load the language data

Yes

No value

language

Any supported language name, as listed in Supported Languages and Data File Locations

Yes

No value

preference_name

User-specified preference name for this language data

Yes

No value

Example

declare
    params CLOB := '{"table_name"      : "eos_data_1",
                     "column_name"     : "token",
                     "language"        : "indonesian",
                     "preference_name" : "my_lang_1"}';
begin
    DBMS_VECTOR_CHAIN.CREATE_LANG_DATA(
        JSON (params));
end;
/

End-to-end example:

To run an end-to-end example scenario using this procedure, see Create and Use Custom Language Data.