Create and Use Custom Vocabulary
Create and use your own vocabulary of tokens when chunking data.
Here, you use the chunker helper function
CREATE_VOCABULARY
from the DBMS_VECTOR_CHAIN
package to load custom vocabulary. This vocabulary file contains a list of tokens, recognized by your vector embedding model's tokenizer.
After loading the token vocabulary, you can now use the
BY VOCABULARY
chunking mode (with VECTOR_CHUNKS
or UTL_TO_CHUNKS
) to split data by counting the number of tokens.
Related Topics
Parent topic: Configure Chunking Parameters