Use Meta Llama 4 in OCI Generative AI

OCI Generative AI now supports Mete Llama 4 models, Scout and Maverick, on Oracle Cloud Infrastructure (OCI) Generative AI service. These models leverage a Mixture of Experts (MoE) architecture, enabling efficient and powerful processing capabilities. Optimized for multimodal understanding, multilingual tasks, coding, tool-calling, and powering agentic systems, the Llama 4 series brings new possibilities to enterprise AI applications.

Key Highlights
  • Multimodal Capabilities: Both models are natively multimodal, capable of processing and integrating various data types, including text and images.​
  • Multilingual Support: Trained on data encompassing 200 languages, with fine-tuning support for 12 languages including Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Image understanding is limited to English.
  • Efficient Deployment: Llama 4 Scout is designed for accessibility with a smaller GPU footprint.
  • Knowledge Cutoff: August 2024
  • Usage Restrictions: The Llama 4 Acceptable Use Policy restricts their use in the European Union (EU).
  • Available for on-demand inferencing and dedicated hosting.
Available Regions
  • US Midwest (Chicago) (on-demand and dedicated AI clusters)
  • Brazil East (Sao Paulo) (dedicated AI clusters)
  • Japan Central (Osaka) (dedicated AI clusters)
  • UK South (London) (dedicated AI clusters)
Meta Llama 4 Scout
  • Architecture: Features 17 billion active parameters within a total of about 109 billion parameters, using 16 experts.
  • Context Window: Supports a context length of 192k tokens.
  • Deployment: Designed for efficient operation on a small GPU footprint.
  • Performance: Shows superior performance compared to previous models across many benchmarks.
Llama 4 Maverick
  • Architecture: Similar to Meta Llama Scout, this model features 17 billion active parameters but within a larger framework of about 400 billion parameters, using 128 experts.​
  • Context Window: Supports a context length of 512k tokens.
  • Performance: Matches advanced models in coding and reasoning tasks.

Important Note: Before you use this model, review Meta's Llama 4 Acceptable Use Policy.

For a list of offered models and their regions, see Pretrained Foundational Models in Generative AI. For information about the service, see the Generative AI documentation.