LLMs and Embedders

This page presents the abstract interfaces used to plug LLMs and embedders into Oracle Agent Memory.

LLM Interface

class `oracleagentmemory.apis.llms.ILlm`

Bases: ABC

Abstract interface for LLM invocation.

method `generate` (abstract)

Generate a response from an LLM synchronously.

Parameters:
- prompt str | Sequence[dict[str, str]] – Either a plain text prompt (treated as a single user message) or a chat-style list of messages, where each message is a mapping with at least a "content" key and optionally a "role".
- response_json_schema dict[str, Any] | None – Optional JSON Schema describing the expected response format.
- **kwargs (Any) – Provider-specific keyword arguments forwarded to the underlying backend.
Returns: Normalized LLM output.
Return type: LlmResponse

method `generate_async` (abstract, async)

Asynchronously generate a response from an LLM.

Parameters:
- prompt str | Sequence[dict[str, str]] – Either a plain text prompt (treated as a single user message) or a chat-style list of messages, where each message is a mapping with at least a "content" key and optionally a "role".
- response_json_schema dict[str, Any] | None – Optional JSON Schema describing the expected response format.
- **kwargs (Any) – Provider-specific keyword arguments forwarded to the underlying backend.
Returns: Normalized LLM output.
Return type: LlmResponse

LLM Responses

class `oracleagentmemory.apis.llms.LlmResponse`

Bases: object

A small normalized response returned by ILlm.

Parameters: text str

text

The primary generated text content.

Type: str

Embedder Interface

class `oracleagentmemory.apis.IEmbedder`

Bases: ABC

Abstract interface for text embedders.

method `embed` (abstract)

Embed a batch of texts into a 2D float32 NumPy array.

Parameters:
- texts list[str] – Batch of texts to embed.
- is_query bool – Whether the batch is being embedded for query-time retrieval.
Returns: A 2D array shaped (len(texts), dim) with dtype=float32.
Return type: numpy.ndarray

method `embed_async` (abstract, async)

Embed a batch of texts into a 2D float32 NumPy array.

Parameters:
- texts list[str] – Batch of texts to embed.
- is_query bool – Whether the batch is being embedded for query-time retrieval.
Returns: A 2D array shaped (len(texts), dim) with dtype=float32.
Return type: numpy.ndarray

LiteLLM Adapters

class `oracleagentmemory.core.llms.LlmApiType`

Bases: str, Enum

Supported OpenAI-compatible API families for Llm.

CHAT_COMPLETIONS = 'chat_completions'

RESPONSES = 'responses'

class `oracleagentmemory.core.llms.Llm`

Bases: ILlm

Adapter for generating model responses.

Create an LLM adapter.

Parameters:
- model str – Model identifier sent to the underlying model provider.
- api_base str | None – Optional base URL for an OpenAI-compatible endpoint.
- api_key str | None – Optional API key used when contacting the provider.
- api_type LlmApiType – API family to call. Use LlmApiType.CHAT_COMPLETIONS for Chat Completions or LlmApiType.RESPONSES for the Responses API. Defaults to LlmApiType.CHAT_COMPLETIONS.
- stream bool – Whether to request streaming output. The stream is consumed internally and returned as a single LlmResponse.
- temperature float | None – Optional sampling temperature.
- max_tokens int | None – Optional output token limit. With api_type=LlmApiType.CHAT_COMPLETIONS this is sent as max_tokens. With api_type=LlmApiType.RESPONSES this is sent as max_output_tokens.
- reasoning_effort str | None – Optional reasoning effort. With api_type=LlmApiType.CHAT_COMPLETIONS this is sent as reasoning_effort. With api_type=LlmApiType.RESPONSES this is converted to reasoning={"effort": ...}.
- **default_kwargs (Any) – Advanced default keyword arguments applied to every call. Prefer the explicit parameters above for common connection and generation settings. When the same setting is provided both explicitly and in default_kwargs, the explicit parameter takes precedence.

Examples

OCI Generative AI models use LiteLLM’s "oci/..." model identifiers. A common setup is to pass OCI API key authentication details from the standard OCI config file through LiteLLM-specific keyword arguments. The OCI Python SDK is not installed by this package; applications that already depend on it may alternatively pass an oci_signer object.

import configparser
from pathlib import Path
parser = configparser.RawConfigParser()
parser.read(Path("~/.oci/config").expanduser())
cfg = parser["DEFAULT"]
key_file = Path(cfg["key_file"]).expanduser()
oci_llm = Llm(
    model="oci/openai.gpt-oss-120b",
    oci_compartment_id="ocid1.compartment.oc1..example",
    oci_region=cfg.get("region", "us-chicago-1"),
    oci_user=cfg["user"],
    oci_fingerprint=cfg["fingerprint"],
    oci_tenancy=cfg["tenancy"],
    oci_key_file=str(key_file),
)
oci_llm.generate("Reply with OK.")

OpenAI-hosted models use LiteLLM model identifiers such as "openai/gpt-5.1" and an OpenAI API key. Chat Completions is the default API family.

openai_llm = Llm(
    model="openai/gpt-5.1",
    api_key="sk-example",
    temperature=0,
    max_tokens=128,
)
openai_llm.model
'openai/gpt-5.1'
openai_llm.generate("Reply with OK.")

Use api_type=LlmApiType.RESPONSES when the target model should be called through the OpenAI Responses API instead of Chat Completions.

responses_llm = Llm(
    model="openai/gpt-5.4",
    api_key="sk-example",
    api_type=LlmApiType.RESPONSES,
    reasoning_effort="high",
    stream=True,
)
responses_llm.model
'openai/gpt-5.4'

Self-hosted OpenAI-compatible servers, including vLLM, are called with an "openai/..." model identifier plus the server’s /v1 base URL. Pass a nominal api_key such as "none" when the endpoint does not enforce authentication.

vllm_llm = Llm(
    model="openai/openai/gpt-oss-120b",
    api_base="http://localhost:8000/v1",
    api_key="none",
    stream=True,
)
vllm_llm.model
'openai/openai/gpt-oss-120b'
vllm_llm.generate("Reply with OK.")

method `generate`

Generate a response.

Parameters:
- prompt str | Sequence[dict[str, str]] – Prompt string or chat messages. A string is treated as a single user message.
- response_json_schema dict[str, Any] | None – Optional JSON Schema describing the expected response format. When provided, this method uses the provider-native structured output mechanism via OpenAI-compatible response_format.
- **kwargs (Any) – Additional call parameters sent with this request. Pass api_type=LlmApiType.RESPONSES to route this call through the Responses API.
Returns: Normalized LLM output.
Return type: LlmResponse

method `generate_async` (async)

Asynchronously generate a response.

Parameters:
- prompt str | Sequence[dict[str, str]] – Prompt string or chat messages. A string is treated as a single user message.
- response_json_schema dict[str, Any] | None – Optional JSON Schema describing the expected response format. When provided, this method uses the provider-native structured output mechanism via OpenAI-compatible response_format.
- **kwargs (Any) – Additional call parameters sent with this request. Pass api_type=LlmApiType.RESPONSES to route this call through the Responses API.
Returns: Normalized LLM output.
Return type: LlmResponse

class `oracleagentmemory.core.embedders.Embedder`

Bases: IEmbedder

Provider-backed embedder.

Create a provider-backed embedder.

Parameters:
- model str – Model identifier sent to the underlying embedding provider.
- api_base str | None – Optional base URL for an OpenAI-compatible endpoint.
- api_key str | None – Optional API key used when contacting the provider.
- normalize bool – Whether to L2-normalize embeddings returned by the provider.
- query_prefix str | None – Optional prefix added only when embedding query texts.
- truncate_prompt_tokens int | None – Optional input token limit forwarded to providers that support truncating long embedding prompts.
- **default_kwargs (Any) – Advanced default keyword arguments applied to every embedding call. Prefer the explicit parameters above for common settings.

Examples

OCI Generative AI embedding models use "oci/..." model identifiers. A common setup is to pass OCI API key authentication details from the standard OCI config file through LiteLLM-specific keyword arguments. The OCI Python SDK is not installed by this package; applications that already depend on it may alternatively pass an oci_signer object.

import configparser
from pathlib import Path
parser = configparser.RawConfigParser()
parser.read(Path("~/.oci/config").expanduser())
cfg = parser["DEFAULT"]
key_file = Path(cfg["key_file"]).expanduser()
oci_embedder = Embedder(
    model="oci/cohere.embed-english-v3.0",
    oci_compartment_id="ocid1.compartment.oc1..example",
    oci_region=cfg.get("region", "us-chicago-1"),
    oci_user=cfg["user"],
    oci_fingerprint=cfg["fingerprint"],
    oci_tenancy=cfg["tenancy"],
    oci_key_file=str(key_file),
)
oci_embedder.embed(["hello world"])

OpenAI-hosted embedding models use identifiers such as "openai/text-embedding-3-small" with an OpenAI API key.

openai_embedder = Embedder(
    model="openai/text-embedding-3-small",
    api_key="sk-example",
    truncate_prompt_tokens=8192,
)
openai_embedder.model
'openai/text-embedding-3-small'
openai_embedder.embed(["hello world"])

Self-hosted OpenAI-compatible embedding servers, including vLLM, use the "hosted_vllm/..." provider prefix with the server’s /v1 base URL.

vllm_embedder = Embedder(
    model="hosted_vllm/sentence-transformers/all-MiniLM-L6-v2",
    api_base="http://localhost:8000/v1",
)
vllm_embedder.model
'hosted_vllm/sentence-transformers/all-MiniLM-L6-v2'
vllm_embedder.embed(["hello world"])

method `embed`

Embed a batch of texts using the configured provider.

Parameters:
- texts list[str] – Batch of raw text strings to embed.
- is_query bool – Whether the text is a query. Query texts receive query_prefix when one was configured.
Returns: A two-dimensional float32 matrix with the embedding vectors returned by the provider.
Return type: numpy.ndarray
Raises: RuntimeError – If the provider response payload does not include embedding data.

method `embed_async` (async)

Asynchronously embed a batch of texts using the configured provider.

Parameters:
- texts list[str] – Batch of raw text strings to embed.
- is_query bool – Whether the text is a query. Query texts receive query_prefix when one was configured.
Returns: A two-dimensional float32 matrix with the embedding vectors returned by the provider.
Return type: numpy.ndarray
Raises: RuntimeError – If the provider response payload does not include embedding data.

LLMs and Embedders

LLM Interface

class oracleagentmemory.apis.llms.ILlm

method generate (abstract)

method generate_async (abstract, async)

LLM Responses

class oracleagentmemory.apis.llms.LlmResponse

text

Embedder Interface

class oracleagentmemory.apis.IEmbedder

method embed (abstract)

method embed_async (abstract, async)

LiteLLM Adapters

class oracleagentmemory.core.llms.LlmApiType

CHAT_COMPLETIONS *= 'chat_completions'*

RESPONSES *= 'responses'*

class oracleagentmemory.core.llms.Llm

method generate

method generate_async (async)

class oracleagentmemory.core.embedders.Embedder

method embed

method embed_async (async)