Glossary
dimension format
The dimensions of a vector can be represented using numbers of varying types and precisions, called the dimension format. The dimensions supported by vector embeddings in Oracle Database are
INT8
(1-byte signed integer), FLOAT32
(4-byte single-precision floating point number), and FLOAT64
(8-byte double-precision floating point number). All dimensions of a vector must have the same dimension format.
Parent topic: Glossary
distance metric
Distance metric refers to the mathematical function used to compute distance between vectors. Popular distance metrics supported by Oracle AI Vector search include Euclidean distance, Cosine distance, and Manhattan distance, among others.
Parent topic: Glossary
embedding model
Embedding models are Machine Learning algorithms that are trained to capture semantic information of unstructured data and represent it as vectors in multidimensional space. Different embedding models exist for different types of unstructured data, for example, BERT for Text data, ResNet-50 for Image data, and so on.
Parent topic: Glossary
hybrid search
Hybrid search is an advanced information retrieval technique that lets you search documents by both keywords and vectors. Hybrid searches are run against hybrid vector indexes by querying it in various search modes. By integrating traditional keyword-based text search with vector-based similarity search, you can improve the overall search experience and provide users with more relevant information.
Parent topic: Glossary
hybrid vector index
A hybrid vector index is a class of specialized Domain Index that combines the existing Oracle Text search indexes and Oracle AI Vector Search vector indexes into one unified structure. A single index contains both textual and vector fields for a document, enabling you to perform a combination of keyword-based text search and vector-based similarity search simultaneously.
Parent topic: Glossary
large language model
Large language models (LLMs) are advanced Machine Learning models designed to understand, process, and generate natural language for rich human interaction. They are typically built using deep learning algorithms and are pretrained on vast amounts of data. Popular examples include Open AI’s GPT-4, Cohere’s Command R+, and Meta’s LLaMa 3.
Parent topic: Glossary
multi-vector
Multi-vector refers to a scenario where multiple vectors correspond to a single entity. For example, a large text document can be chunked into paragraphs and every paragraph can be embedded into a separate vector. A similarity search query could retrieve matching documents based on the most similar paragraph (closest vectors) per document to a given query vector. Oracle AI Vector Search has the option of Partitioned Row Limiting Fetch syntax to enable efficient multi-vector searches.
Parent topic: Glossary
neighbor graph
A neighbor graph is a graph-based data structure used in vector indexes. For example, a Hierarchical Navigable Small World (HNSW) vector index leverages a multilayer neighbor graph index. In a neighbor graph, each vertex of the graph represents a vector in the data set, and edges are created between vertices representing similar vectors.
Parent topic: Glossary
query accuracy
Query accuracy is an intuitive indicator of the quality of an approximate query result obtained from a vector index search. Consider a query vector, for which an exact search, that searches through all vectors in the data set, returns Top 5 matches as
{ID1, ID3, ID5, ID7, ID9}
, and an approximate vector index search returns Top 5 matches as {ID1, ID3, ID5, ID9, ID10}
. Since the approximate result has 4 out of 5 correct matches, the query accuracy is 80%.
Parent topic: Glossary
query vector
Query vector refers to the vector embedding representing the item for which the user wants to find similar items using similarity search. For example, while searching for movies similar to a user’s favorite movie, the vector embedding representing the user’s favorite movie is the query vector.
Parent topic: Glossary
Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is a popular technique for enhancing the accuracy of responses generated by Large Language Models by augmenting the user-provided prompt with relevant, up-to-date, enterprise-specific content retrieved using AI Vector Search. Applications, such as Chat Assistants, built using RAG are often more accurate, reliable, and cost-effective.
Parent topic: Glossary
similarity search
Similarity Search is a common operation in information retrieval to find items in a data set that are similar to a user-provided query item. For example, finding movies similar to a user’s favorite movie is an example of similarity search. Vectors can enable efficient similarity searches by leveraging the property that the mathematical distance between vectors is a proxy for similarity, as in, the more similar two items are, the smaller the distance between the vectors.
Parent topic: Glossary
vector
A vector is a mathematical entity that has a magnitude and a direction. It is typically represented as an array of numbers, which are coordinates that define its position in a multidimensional space.
Parent topic: Glossary
vector distance
Vector distance refers to the mathematical distance between two vectors in a multidimensional space. The vector distance between similar items is smaller than the vector distance between dissimilar items. Vector distance is meaningful only if the vectors being compared are generated by the same embedding model.
Parent topic: Glossary
vector embedding
A vector embedding is a numerical representation of text, image, audio, or video data that encodes the semantic content of the data, and not the underlying words or pixels. The terms vector and vector embedding are often used interchangeably in AI Vector Search.
Parent topic: Glossary
vector index
Vector indexes are a class of specialized indexing data structures that are designed to accelerate similarity searches using high-dimensional vectors. They use techniques such as clustering, partitioning, and neighbor graphs to group vectors representing similar items, which drastically reduces the search space, thereby making the search process extremely efficient. Unlike traditional databases indexes, vector indexes enable approximate similarity searches, which allow users to trade off query accuracy for query performance to better suit application requirements.
Parent topic: Glossary
Vector Memory Pool
Vector Memory Pool is a region of the System Global Area (SGA) that is dedicated to storing In-Memory Neighbor Graph Vector Indexes (HNSW index), as well as metadata for Neighbor Partition Vector Indexes. It can be specified by using the
VECTOR_MEMORY_SIZE
parameter.
Parent topic: Glossary