LLMs and Embedders
This page presents the abstract interfaces used to plug LLMs and embedders into Oracle Agent Memory.
LLM Interface
class oracleagentmemory.apis.llms.ILlm
Bases: ABC
Abstract interface for LLM invocation.
method generate (abstract)
Generate a response from an LLM synchronously.
- Parameters:
- prompt
str | Sequence[dict[str, str]]– Either a plain text prompt (treated as a single user message) or a chat-style list of messages, where each message is a mapping with at least a"content"key and optionally a"role". - response_json_schema
dict[str, Any] | None– Optional JSON Schema describing the expected response format. - **kwargs (Any) – Provider-specific keyword arguments forwarded to the underlying backend.
- prompt
- Returns: Normalized LLM output.
- Return type: LlmResponse
method generate_async (abstract, async)
Asynchronously generate a response from an LLM.
- Parameters:
- prompt
str | Sequence[dict[str, str]]– Either a plain text prompt (treated as a single user message) or a chat-style list of messages, where each message is a mapping with at least a"content"key and optionally a"role". - response_json_schema
dict[str, Any] | None– Optional JSON Schema describing the expected response format. - **kwargs (Any) – Provider-specific keyword arguments forwarded to the underlying backend.
- prompt
- Returns: Normalized LLM output.
- Return type: LlmResponse
LLM Responses
class oracleagentmemory.apis.llms.LlmResponse
Bases: object
A small normalized response returned by ILlm.
- Parameters:
text
str
text
The primary generated text content.
- Type: str
Embedder Interface
class oracleagentmemory.apis.IEmbedder
Bases: ABC
Abstract interface for text embedders.
method embed (abstract)
Embed a batch of texts into a 2D float32 NumPy array.
- Parameters:
- texts
list[str]– Batch of texts to embed. - is_query
bool– Whether the batch is being embedded for query-time retrieval.
- texts
- Returns:
A 2D array shaped
(len(texts), dim)withdtype=float32. - Return type: numpy.ndarray
method embed_async (abstract, async)
Embed a batch of texts into a 2D float32 NumPy array.
- Parameters:
- texts
list[str]– Batch of texts to embed. - is_query
bool– Whether the batch is being embedded for query-time retrieval.
- texts
- Returns:
A 2D array shaped
(len(texts), dim)withdtype=float32. - Return type: numpy.ndarray
LiteLLM Adapters
class oracleagentmemory.core.llms.LlmApiType
Bases: str, Enum
Supported OpenAI-compatible API families for Llm.
CHAT_COMPLETIONS *= 'chat_completions'*
RESPONSES *= 'responses'*
class oracleagentmemory.core.llms.Llm
Bases: ILlm
Adapter for generating model responses.
Create an LLM adapter.
- Parameters:
- model
str– Model identifier sent to the underlying model provider. - api_base
str | None– Optional base URL for an OpenAI-compatible endpoint. - api_key
str | None– Optional API key used when contacting the provider. - api_type
LlmApiType– API family to call. UseLlmApiType.CHAT_COMPLETIONSfor Chat Completions orLlmApiType.RESPONSESfor the Responses API. Defaults toLlmApiType.CHAT_COMPLETIONS. - stream
bool– Whether to request streaming output. The stream is consumed internally and returned as a singleLlmResponse. - temperature
float | None– Optional sampling temperature. - max_tokens
int | None– Optional output token limit. Withapi_type=LlmApiType.CHAT_COMPLETIONSthis is sent asmax_tokens. Withapi_type=LlmApiType.RESPONSESthis is sent asmax_output_tokens. - reasoning_effort
str | None– Optional reasoning effort. Withapi_type=LlmApiType.CHAT_COMPLETIONSthis is sent asreasoning_effort. Withapi_type=LlmApiType.RESPONSESthis is converted toreasoning={"effort": ...}. - **default_kwargs (Any) – Advanced default keyword arguments applied to every call. Prefer
the explicit parameters above for common connection and generation
settings. When the same setting is provided both explicitly and in
default_kwargs, the explicit parameter takes precedence.
- model
Examples
OCI Generative AI models use LiteLLM’s "oci/..." model identifiers.
A common setup is to pass OCI API key authentication details from the
standard OCI config file through LiteLLM-specific keyword arguments.
The OCI Python SDK is not installed by this package; applications that
already depend on it may alternatively pass an oci_signer object.
import configparser
from pathlib import Path
parser = configparser.RawConfigParser()
parser.read(Path("~/.oci/config").expanduser())
cfg = parser["DEFAULT"]
key_file = Path(cfg["key_file"]).expanduser()
oci_llm = Llm(
model="oci/openai.gpt-oss-120b",
oci_compartment_id="ocid1.compartment.oc1..example",
oci_region=cfg.get("region", "us-chicago-1"),
oci_user=cfg["user"],
oci_fingerprint=cfg["fingerprint"],
oci_tenancy=cfg["tenancy"],
oci_key_file=str(key_file),
)
oci_llm.generate("Reply with OK.")
OpenAI-hosted models use LiteLLM model identifiers such as
"openai/gpt-5.1" and an OpenAI API key. Chat Completions is the
default API family.
openai_llm = Llm(
model="openai/gpt-5.1",
api_key="sk-example",
temperature=0,
max_tokens=128,
)
openai_llm.model
'openai/gpt-5.1'
openai_llm.generate("Reply with OK.")
Use api_type=LlmApiType.RESPONSES when the target model should be
called through the OpenAI Responses API instead of Chat Completions.
responses_llm = Llm(
model="openai/gpt-5.4",
api_key="sk-example",
api_type=LlmApiType.RESPONSES,
reasoning_effort="high",
stream=True,
)
responses_llm.model
'openai/gpt-5.4'
Self-hosted OpenAI-compatible servers, including vLLM, are called with
an "openai/..." model identifier plus the server’s /v1 base URL.
Pass a nominal api_key such as "none" when the endpoint does not
enforce authentication.
vllm_llm = Llm(
model="openai/openai/gpt-oss-120b",
api_base="http://localhost:8000/v1",
api_key="none",
stream=True,
)
vllm_llm.model
'openai/openai/gpt-oss-120b'
vllm_llm.generate("Reply with OK.")
method generate
Generate a response.
- Parameters:
- prompt
str | Sequence[dict[str, str]]– Prompt string or chat messages. A string is treated as a single user message. - response_json_schema
dict[str, Any] | None– Optional JSON Schema describing the expected response format. When provided, this method uses the provider-native structured output mechanism via OpenAI-compatibleresponse_format. - **kwargs (Any) – Additional call parameters sent with this request. Pass
api_type=LlmApiType.RESPONSESto route this call through the Responses API.
- prompt
- Returns: Normalized LLM output.
- Return type: LlmResponse
method generate_async (async)
Asynchronously generate a response.
- Parameters:
- prompt
str | Sequence[dict[str, str]]– Prompt string or chat messages. A string is treated as a single user message. - response_json_schema
dict[str, Any] | None– Optional JSON Schema describing the expected response format. When provided, this method uses the provider-native structured output mechanism via OpenAI-compatibleresponse_format. - **kwargs (Any) – Additional call parameters sent with this request. Pass
api_type=LlmApiType.RESPONSESto route this call through the Responses API.
- prompt
- Returns: Normalized LLM output.
- Return type: LlmResponse
class oracleagentmemory.core.embedders.Embedder
Bases: IEmbedder
Provider-backed embedder.
Create a provider-backed embedder.
- Parameters:
- model
str– Model identifier sent to the underlying embedding provider. - api_base
str | None– Optional base URL for an OpenAI-compatible endpoint. - api_key
str | None– Optional API key used when contacting the provider. - normalize
bool– Whether to L2-normalize embeddings returned by the provider. - query_prefix
str | None– Optional prefix added only when embedding query texts. - truncate_prompt_tokens
int | None– Optional input token limit forwarded to providers that support truncating long embedding prompts. - **default_kwargs (Any) – Advanced default keyword arguments applied to every embedding call. Prefer the explicit parameters above for common settings.
- model
Examples
OCI Generative AI embedding models use "oci/..." model identifiers.
A common setup is to pass OCI API key authentication details from the
standard OCI config file through LiteLLM-specific keyword arguments.
The OCI Python SDK is not installed by this package; applications that
already depend on it may alternatively pass an oci_signer object.
import configparser
from pathlib import Path
parser = configparser.RawConfigParser()
parser.read(Path("~/.oci/config").expanduser())
cfg = parser["DEFAULT"]
key_file = Path(cfg["key_file"]).expanduser()
oci_embedder = Embedder(
model="oci/cohere.embed-english-v3.0",
oci_compartment_id="ocid1.compartment.oc1..example",
oci_region=cfg.get("region", "us-chicago-1"),
oci_user=cfg["user"],
oci_fingerprint=cfg["fingerprint"],
oci_tenancy=cfg["tenancy"],
oci_key_file=str(key_file),
)
oci_embedder.embed(["hello world"])
OpenAI-hosted embedding models use identifiers such as
"openai/text-embedding-3-small" with an OpenAI API key.
openai_embedder = Embedder(
model="openai/text-embedding-3-small",
api_key="sk-example",
truncate_prompt_tokens=8192,
)
openai_embedder.model
'openai/text-embedding-3-small'
openai_embedder.embed(["hello world"])
Self-hosted OpenAI-compatible embedding servers, including vLLM, use
the "hosted_vllm/..." provider prefix with the server’s /v1
base URL.
vllm_embedder = Embedder(
model="hosted_vllm/sentence-transformers/all-MiniLM-L6-v2",
api_base="http://localhost:8000/v1",
)
vllm_embedder.model
'hosted_vllm/sentence-transformers/all-MiniLM-L6-v2'
vllm_embedder.embed(["hello world"])
method embed
Embed a batch of texts using the configured provider.
- Parameters:
- texts
list[str]– Batch of raw text strings to embed. - is_query
bool– Whether the text is a query. Query texts receivequery_prefixwhen one was configured.
- texts
- Returns:
A two-dimensional
float32matrix with the embedding vectors returned by the provider. - Return type: numpy.ndarray
- Raises: RuntimeError – If the provider response payload does not include embedding data.
method embed_async (async)
Asynchronously embed a batch of texts using the configured provider.
- Parameters:
- texts
list[str]– Batch of raw text strings to embed. - is_query
bool– Whether the text is a query. Query texts receivequery_prefixwhen one was configured.
- texts
- Returns:
A two-dimensional
float32matrix with the embedding vectors returned by the provider. - Return type: numpy.ndarray
- Raises: RuntimeError – If the provider response payload does not include embedding data.