Create Text Embeddings in Generative AI

Use the Cohere Embed models in OCI Generative AI to convert text to vector embeddings to use in applications for semantic searches, text classification, or text clustering.

Input data for text embeddings must have the following requirements:

You can add sentences, phrases, or paragraphs for embeddings either one phrase at a time, or by uploading a file.
Only files with a .txt extension are allowed.
If you use an input file, each input sentence, phrase, or paragraph in the file must be separated with a newline character.
A maximum of 96 inputs are allowed for each run.
In the Console, each input must be less than 512 tokens for the text only models.
If an input is too long, select whether to cut off the start or the end of the text to fit within the token limit by setting the Truncate parameter to Start or End. If an input exceeds the 512 token limit and the Truncate parameter is set to None, you get an error message.
For the text and image models, you can have files and inputs that all add up to 128,000 tokens.
For the text and image embed models, such as Cohere Embed English Image V3 you can either add text or add one image only. For the image, you can use API. Image input isn't available in the Console. For API, input a base64 encoded image in each run. For example, a 512 x 512 image is converted to about 1,610 tokens.

1. In the navigation bar of the Console, select a region with Generative AI, for example, US Midwest (Chicago). See which models are offered in your region.
2. Open the navigation menu and select Analytics & AI. Under AI Services, select Generative AI.
3. Select a compartment that you have permission to work in. If you don't see the playground, ask an administrator to give you access to Generative AI resources and then return to the following steps.
4. Select Playground.
5. Select Embedding.
6. Select a model for creating text embeddings by performing one of the following actions:
  
  In the Model list, select a model.
  
  Select View model details, and then Select Choose model.
7. (Optional) To use an example from the Example list, use the following steps:
  
  Select an example from the Example list.
  
  Select Run to generate embeddings for the example.
  
  Review a two-dimensional version of the output in the Output vector projection section.
  
  To visualize the output with embeddings, output vectors are projected into two dimensions and plotted as points. Points that are close together correspond to phrases that the model considers similar.
  
  Select Clear to remove all the sentences and start generating embeddings for new sentences.
8. (Optional) Add a .png or .jpg image with a size of 5 MB or less.
  Only one image is allowed.
9. In the Sentence input area, enter text in one of the following ways:
  
  Type a sentence in the 1. box, and then Select Add sentence to add more sentences.
  
  Select Upload file and select a file with text that you want to add.
  
  Note
  
  Only files with a .txt extension are allowed. Each input sentence, phrase, or paragraph must be separated with a newline character. A maximum 96 inputs are allowed for each run, and each input must be less than 512 tokens. You can add sentences manually or upload more than one file until you reach the maximum number of inputs.
10. For the Truncate parameter, select whether to truncate the start or end tokens when the tokens exceed the maximum number of allowed tokens (512).
  
  Tip
  
  For input that exceeds 512 tokens, if you set the Truncate parameter to None, you get an error message. Before you run an embedding model, select Start or End for the Truncate parameter.
11. Select Run.
12. Review a two-dimensional version of the output in the Output vector projection section.
  To visualize the outputs with embeddings, output vectors are projected into two dimensions and plotted as points. Points that are close together correspond to phrases that the model considers similar.
13. When you're happy with the result, Select Export embeddings to JSON to get a JSON file that contains a 1024-dimensional vector for each input.
14. (Optional) Select View code, select a programming language or framework, Select Copy code, and paste the code into a file. Ensure that the file maintains the format of the pasted code.
  
  Tip
  
  If you're using the code in an application:
  
  Ensure that you authenticate your code.
  
  Review LlamaIndex Integration and LangChain Integration.
15. (Optional) Select Clear to remove all the sentences and start generating embeddings for new sentences.
  
  Note
  
  When you Select Clear the Truncate parameter resets to its default value of None.
To create embeddings for text, use the embed-text-result operation.

Enter the following command for a list of options to create text embeddings.
```
oci generative-ai-inference embed-text-result embed-text -h
```
For a complete list of parameters and values for the OCI Generative AI CLI commands, see Generative AI Inference CLI and Generative AI Management CLI.
Run the EmbedText operation to create text embeddings.

For information about using the API and signing requests, see REST API documentation and Security Credentials. For information about SDKs, see SDKs and the CLI.

Oracle Cloud Infrastructure Documentation

Create Text Embeddings in Generative AI