Create Text Embeddings in Generative AI
Use the Cohere Embed models in OCI Generative AI to convert text to vector embeddings to use in applications for semantic searches, text classification, or text clustering.
Input data for text embeddings must have the following requirements:
- You can add sentences, phrases, or paragraphs for embeddings either one phrase at a time, or by uploading a file.
- Only files with a
.txt
extension are allowed. - If you use an input file, each input sentence, phrase, or paragraph in the file must be separated with a newline character.
- A maximum of 96 inputs are allowed for each run.
- In the Console, each input must be less than 512 tokens for the text only models.
- If an input is too long, select whether to cut off the start or the end of the text to fit within the token limit by setting the Truncate parameter to Start or End. If an input exceeds the 512 token limit and the Truncate parameter is set to None, you get an error message.
- For the text and image models, you can have files and inputs that all add up to 128,000 tokens.
- For the text and image embed models, such as Cohere Embed English Image V3 you can either add text or add one image only. For the image, you can use API. Image input isn't available in the Console. For API, input a base64 encoded image in each run. For example, a 512 x 512 image is converted to about 1,610 tokens.
To create embeddings for text, use the embed-text-result operation.
Enter the following command for a list of options to create text embeddings.
oci generative-ai-inference embed-text-result embed-text -h
For a complete list of parameters and values for the OCI Generative AI CLI commands, see Generative AI Inference CLI and Generative AI Management CLI.
Run the EmbedText operation to create text embeddings.
For information about using the API and signing requests, see REST API documentation and Security Credentials. For information about SDKs, see SDKs and the CLI.