Local and Remote Model Support

Coherence RAG supports both local and remote models for embedding and scoring functionalities, allowing users to choose between on-premise execution and cloud-based inference.

Local models are automatically downloaded from Hugging Face and run on each cluster member using ONNX Runtime. These models provide optimized performance without requiring external API calls, ensuring cost efficiency, privacy, and independence from third-party services.

Remote models include OCI GenAI models, OpenAI, Cohere, Anthropic, AWS Bedrock, and any other models supported by LangChain4J. These models offer access to state-of-the-art AI capabilities via API-based cloud services, allowing for enhanced accuracy and broader model selection at the cost of external dependencies and API-related latency.