Model Limitations in Generative AI
Review the following model requirements for the OCI Generative AI custom and base models to get the most out of your models.
For key features of the pretrained base models, see Pretrained Foundational Models in Generative AI.
Matching Base Models to Clusters
To host an OCI Generative AI pretrained or custom model on a hosting dedicated AI cluster, go to Pretrained Foundational Models in Generative AI. Then, select the pretrained model or the custom model's base model. On the Dedicated AI Cluster for the Model section of the page, see the unit size and required units for hosting that foundational model.
Adding Endpoints to Hosting Clusters
To host a model for inference on a hosting dedicated AI cluster, you must create an endpoint for that model. Then, you can add either add a custom model or a pretrained foundational model to that endpoint.
About Endpoint Aliases and Stack Serving
A hosting dedicated AI cluster can have up to 50 endpoints. Use these endpoints for the following use cases:
- Creating Endpoint Aliases
-
Create aliases with many endpoints. These 50 endpoints must either point to the same base model or the same version of a custom model. Creating many endpoints that point to the same model makes it easier to manage the endpoints, because you can use the endpoints for different users or different purposes.
- Stack Serving
-
Host several versions of a custom model on one cluster. This applies to
cohere.command
andcohere.command-light
models that are fine-tuned with theT-Few
training method. Hosting various versions of a fine-tuned model can help you to assess the custom models for different use cases.
To increase the call volume supported by a hosting cluster, you can increase its instance count.
Expand the following sections to review the requirements for hosting models on the same cluster.
Some OCI Generative AI foundational pretrained base models supported for the dedicated serving mode are now deprecated and will retire no sooner than 6 months after the release of the 1st replacement model. You can host a base model, or fine-tune a base model and host the fine-tuned model on a dedicated AI cluster (dedicated serving mode) until the base model is retired. For dedicated serving mode retirement dates, see Retiring the Models.
For hosting the pretrained base chat models, or fine-tuned chat models on a hosting dedicated AI cluster, use the following cluster unit size and endpoint rules that match each base model.
Hosting Cluster Unit Size | Matching Rules |
---|---|
Large Generic 2 for the base model, meta.llama-4-maverick-17b-128e-instruct-fp8 |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models Fine-tuning not available for the |
Large Generic V2 for the base model, meta.llama-4-scout-17b-16e-instruct |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models Fine-tuning not available for the |
LARGE_COHERE_V3 for the base model, cohere.command-a-03-2025 |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models Fine-tuning not available for the |
Small Generic V2 for the base model, meta.llama-3.2-11b-vision-instruct |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models Fine-tuning not available for the |
Large Generic for the base model, meta.llama-3.3-70b-instruct |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models To host several custom models on the same cluster:
|
Large Generic for the base model, meta.llama-3.1-70b-instruct |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models To host several custom models on the same cluster:
|
Large Generic for the base model, meta.llama-3-70b-instruct |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models To host several custom models on the same cluster:
|
Large Generic V2 for the base model, meta.llama-3.2-90b-vision-instruct |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models Fine-tuning not available for the |
Large Generic 2 for the base model, meta.llama-3.1-405b-instruct |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models Fine-tuning not available for the |
Small Cohere V2 for the base model, cohere.command-r-16k (deprecated) |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models To host several custom models on the same cluster:
You can't host different versions of a custom model trained on the |
Small Cohere V2 for the base model, cohere.command-r-08-2024 |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models To host several custom models on the same cluster:
You can't host different versions of a custom model trained on the |
Large Cohere V2_2 for the base model, cohere.command-r-plus (deprecated) |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models Fine-tuning not available for the |
Large Cohere V2_2 for the base model, cohere.command-r-plus-08-2024 |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models Fine-tuning not available for the |
For hosting the rerank model on a hosting dedicated AI cluster, use the following cluster unit size and endpoint rules.
Hosting Cluster Unit Size | Matching Rules |
---|---|
RERANK_COHERE |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models Fine-tuning not available for the Cohere Rerank model. |
For hosting the embedding models on a hosting dedicated AI cluster, use the following cluster unit size and endpoint rules.
Hosting Cluster Unit Size | Matching Rules |
---|---|
Embed Cohere |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models Fine-tuning not available for the Cohere Embed models. |
- Not Available on-demand: All OCI Generative AI foundational pretrained models supported for the on-demand serving mode that use the text generation and summarization APIs (including the playground) are now retired. We recommend that you use the chat models instead.
- Can be hosted on clusters: If you host a summarization or a generation model such as
cohere.command
on a dedicated AI cluster, (dedicated serving mode), you can continue to use that model until it's retired. These models, when hosted on a dedicated AI cluster are only available in US Midwest (Chicago). See Retiring the Models for retirement dates and definitions.
To host the text generation models on a hosting dedicated AI cluster, use the following cluster unit size and endpoint rules that match your base model.
Hosting Cluster Unit Size | Matching Rules |
---|---|
Small Cohere for the base model, cohere.command-light |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models To host different custom models on the same cluster:
|
Large Cohere for the base model, cohere.command |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models To host different custom models on the same cluster:
|
Llama2 70 for the base model, meta.llama-2-70b-chat |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
|
The
cohere.command
model supported for the on-demand serving mode is now retired and this model is deprecated for the dedicated serving mode. If you're hosting cohere.command
on a dedicated AI cluster, (dedicated serving mode) for summarization, you can continue to use this hosted model replica with the summarization API and in the playground until the cohere.command
model retires for the dedicated serving mode. These models, when hosted on a dedicated AI cluster are only available in US Midwest (Chicago). See Retiring the Models for retirement dates and definitions. We recommend that you use the chat models instead which offer the same summarization capabilities, including control over summary length and style.To host the pretrained cohere.command
summarization model on a hosting dedicated AI cluster, use the following cluster unit size and endpoint rules.
Hosting Cluster Unit Size | Matching Rules |
---|---|
Large Cohere for the base model, cohere.command |
Hosting Base Models To host the same pretrained base model through several endpoints on the same cluster:
Hosting Custom Models To host different custom models on the same cluster:
|
Training Data
Datasets for training custom models have the following requirements:
- A maximum of one fine-tuning dataset is allowed per custom model. This dataset is randomly split to a 80:20 ratio for training and validating.
- Each file must have at least 32 prompt/completion pair examples.
- The file format is
JSONL
. - Each line in the
JSONL
file has the following format:{"prompt": "<a prompt>", "completion": "<expected response given the prompt>"}\n
- The file must be stored in an OCI Object Storage bucket.
Learn about Training Data Requirements in Generative AI.
Input Data for Text Embeddings
Input data for creating text embeddings has the following requirements:
- You can add sentences, phrases, or paragraphs for embeddings either one phrase at a time, or by uploading a file.
- Only files with a
.txt
extension are allowed. - If you use an input file, each input sentence, phrase, or paragraph in the file must be separated with a newline character.
- A maximum of 96 inputs are allowed for each run.
- In the Console, each input must be less than 512 tokens for the text only models.
- If an input is too long, select whether to cut off the start or the end of the text to fit within the token limit by setting the Truncate parameter to Start or End. If an input exceeds the 512 token limit and the Truncate parameter is set to None, you get an error message.
- For the text and image models, you can have files and inputs that all add up to 128,000 tokens.
- For the text and image embed models, such as Cohere Embed English Image V3 you can either add text or add one image only. For the image, you can use API. Image input isn't available in the Console. For API, input a base64 encoded image in each run. For example, a 512 x 512 image is converted to about 1,610 tokens.
Learn about Creating text embeddings in OCI Generative AI.