Use Meta Llama 4 in OCI Generative AI

Services: Generative AI
Release Date: May 14, 2025

OCI Generative AI now supports Meta Llama 4 models, Scout and Maverick, on Oracle Cloud Infrastructure (OCI) Generative AI service. These models leverage a Mixture of Experts (MoE) architecture, enabling efficient and powerful processing capabilities. Optimized for multimodal understanding, multilingual tasks, coding, tool-calling, and powering agentic systems, the Llama 4 series brings new possibilities to enterprise AI applications.

Key Highlights

Multimodal Capabilities: Both models are natively multimodal, capable of processing and integrating various data types, including text and images.
Multilingual Support: Trained on data encompassing 200 languages, with fine-tuning support for 12 languages including Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Image understanding is limited to English.
Efficient Deployment: Llama 4 Scout is designed for accessibility with a smaller GPU footprint.
Knowledge Cutoff: August 2024
Usage Restrictions: The Llama 4 Acceptable Use Policy restricts their use in the European Union (EU).
Available for on-demand inferencing and dedicated hosting.

Available Regions

US Midwest (Chicago) (on-demand and dedicated AI clusters)
Brazil East (Sao Paulo) (dedicated AI clusters)
Japan Central (Osaka) (dedicated AI clusters)
UK South (London) (dedicated AI clusters)

Meta Llama 4 Scout

Architecture: Features 17 billion active parameters within a total of about 109 billion parameters, using 16 experts.
Context Window: Supports a context length of 192k tokens.
Deployment: Designed for efficient operation on a small GPU footprint.
Performance: Shows superior performance compared to previous models across many benchmarks.

Llama 4 Maverick

Architecture: Similar to Meta Llama Scout, this model features 17 billion active parameters but within a larger framework of about 400 billion parameters, using 128 experts.
Context Window: Supports a context length of 512k tokens.
Performance: Matches advanced models in coding and reasoning tasks.

Important Note: Before you use this model, review Meta's Llama 4 Acceptable Use Policy.

For a list of offered models and their regions, see Pretrained Foundational Models in Generative AI. For information about the service, see the Generative AI documentation.

Oracle Cloud Infrastructure Documentation / Release Notes

Use Meta Llama 4 in OCI Generative AI