48 Quick Start
The easiest way to get started with Coherence RAG is to deploy the pre-built `coherence-rag-server` container image to Kubernetes using [Coherence Operator](https://oracle.github.io/coherence-operator/docs/latest/#/docs/about/01_overview).
For example, to deploy a 3-member Coherence RAG cluster that uses a built-in `all-MiniLM-L6-v2` embedding model, and Open AI `gpt-4o-2024-08-06` chat model you could use the following deployment YAML:
coherence-rag-demo.yaml
yaml
apiVersion: coherence.oracle.com/v1
kind: Coherence
metadata:
name: coherence-rag-demo
spec:
replicas: 3
image: ghcr.io/coherence-community/coherence-rag-server:15.1.1-0-0
cluster: coherence-rag-demo
env:
- name: MODEL_EMBEDDING
value: -/all-MiniLM-L6-v2
- name: MODEL_CHAT
value: OpenAI/gpt-4o-2024-08-06
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-api-key
key: key
jvm:
memory:
heapSize: 16g
ports:
- name: server
port: 7001
The above example will expose Coherence RAG REST API on port 7001 on each pod. It will also create `coherence-rag-demo-server` Kubernetes service that maps to that port on all the pods, allowing you to expose REST API using ingress, or to forward local port to it for testing.
Note:
We have to pass OpenAI API key as an environment variable in order to be able to use the chat model, which references Kubernetes secret, for security reasons. To create the secret, you would need to run the following command and specify your own OpenAI API key within the `from-literal` argument:kubectl create secret generic openai-api-key --from-literal=key=sk-Now that we have the secret configured we can deploy our demo cluster:
kubectl apply -f coherence-rag-demo.yaml
Finally, we can forward local port 7001 to the `coherence-rag-demo-server` service, which will allow us to make REST API calls described in the following sections to ingest documents, perform vector searches and augment chat conversations with the results of those searches:
kubectl port-forward service/coherence-rag-demo-server 7001:7001 -n default