Compatible Models for Import
You can import open-source and third-party large language large language models from Hugging Face and OCI Object Storage buckets into OCI Generative AI. After importing a model, you can host it on a dedicated AI cluster, create an endpoint, and use it in the Generative AI service.
Imported models don’t require the 744 unit-hour minimum hosting commitment that applies when you host pretrained models available in OCI Generative AI on dedicated AI clusters.
Your use of these models may be subject to separate terms from the applicable third-party providers, and you are responsible for your compliance with such terms. Oracle disclaims all warranties, indemnities, and liabilities arising from or related to any open-source or third-party LLMs you import.
OCI Generative AI Imported Model Architecture
The OCI Generative AI service uses Open Model Engine (OME) to deploy and manage imported models. OME acts as the orchestration layer between the GPU and the inference runtime.
When you deploy an imported model, OME analyzes the model and pairs it with the most efficient runtime: vLLM (optimized for high-throughput) and SGLang (optimized for high-performance). The vLLM and SGLang runtime engines run the models on the GPUs.
Some models are heavily optimized for SGLang (such as large-scale LLMs and those requiring RadixAttention for long-context memory), while others have better community kernels in vLLM (such as popular open-source LLMs and multimodal models).
While you can import any chat, embedding, (and fine-tuned) model validated through Open Model Engine (with vLLM or SGLang runtime), only models explicitly listed in the Compatible Models section have been assessed by Oracle against open-source model runtimes and tested on Oracle-supported GPU configurations. Notwithstanding the foregoing, Oracle is not responsible for any issues related to the performance, availability, operation, or security of Compatible Models. Unlisted models might have compatibility issues and we recommend that you test any unlisted model before production use.
For available hardware and steps on how to deploy the imported models, see Managing Imported Models.
Compatible Models
- Alibaba Qwen
Features advanced multilingual and multimodal use cases.
- DeepSeek
Optimized for coding, math, and complex reasoning.
- Google Gemma
Built for broad language processing and general-purpose use cases..
- Meta Llama
Models with enhanced Grouped Query Attention (GQA) for improved performance.
- Microsoft Phi
Compact and efficient models for scalable deployments.
- Mistral
Includes embedding and chat models. The embedding model is suited for efficient long-context handling.
- NVIDIA Nemotron
Open-weight models with published training data and recipes, suited for building specialized AI agents.
- OpenAI GptOss
Open-weight Mixture-of-Experts (MoE) models for efficient reasoning and large-context handling.