Release Update 23.6 Features

AI Vector Search: New Vector Distance Metric

AI Vector Search was extended in 23.6 to include a new vector distance metric called Jaccard distance.

This feature allows users the option to use another distance metric between vectors. 

View Documentation

Hybrid Vector Index

The Hybrid Vector Index (HVI) is a new index that allows users to easily index and query their documents using a combination of full text search and semantic vector search to achieve higher quality search results.

The Hybrid Vector Index (HVI) simplifies the process of transforming documents into forms that are amenable both for vector similarity search and for textual search through a single index DDL.

The HVI provides a unified query API that allows users to run textual queries, vector similarity queries, or hybrid queries that leverage both of these approaches.  This lets users easily customize the search experience and enhances search results.

View Documentation

Partition-Local Neighbor Partition Vector Index

This feature enables LOCAL indexing for Neighbor Partition Vector Indexes, optimizing search performance for partitioned tables. This feature conceptually creates a dedicated vector index for each partition, allowing queries with partition key filters to search only the relevant index partitions. As a result, vector searches are more efficient, leading to significantly lower response times when querying large partitioned datasets.

Large enterprise datasets are frequently partitioned by relational attributes to optimize performance. By enabling LOCAL Neighbor Partition Vector Indexes, users benefit from enhanced scalability and accelerated query performance through partition pruning. This approach also ensures more efficient data lifecycle management, making it ideal for handling large-scale enterprise workloads.

View Documentation

Persistent Neighbor Graph Vector Indexes

The HNSW vector index is an inmemory resident multi-layered graph index. The time taken to recreate the inmemory graph on a restart can be improved by having a disk checkpoint image of the graph. This feature adds the checkpoint format as well as the framework to take a disk checkpoint and then use it to recreate the inmemory resident graph structure.

Getting an index access plan after a restart can take a long time. A higher priority disk checkpoint based reload execution improves the time taken to get an index access plan after a restart.

View Documentation

Sparse Vectors

Sparse Vectors are vectors that typically have large number of dimensions, but only a few dimensions have non-zero values. These vectors are often produced by sparse encoding models such as SPLADE and BM25. Conceptually, each dimension in a sparse vector represents a keyword from a specific vocabulary. For a given document, the non-zero dimension values in the vector correspond to the keywords (and their variations) that appear in that document.

Sparse vectors, such as those generated by models like SPLADE and BM25, often outperform dense vectors from models such as BERT, in terms of keyword sensitivity and effectiveness in out-of-domain searches. This superior performance is especially valuable in applications where precise keyword matching is crucial, like in legal or academic research. Additionally, sparse vectors are often used for Hybrid Vector Search, where they can be used alongside dense vectors to combine semantic searches and keyword searches, and provide more relevant search results.

View Documentation

Transactional Support for Neighbor Graph Vector Indexes

HNSW Index is an in-memory hierarchical graph index for vector data. In 23.4, and 23.5, DMLs were not allowed on tables that have HNSW index built on their vector column(s). This feature enables transactions to be executed on such tables. Moreover, vector search queries that use the HNSW Index will see transactionally consistent results, based on their read snapshot. Transactional consistency is guaranteed even on Oracle RAC where the HNSW Index is duplicated on all instances in the Cluster, DMLs occur on one or more instances in the Cluster, and search queries can be executed on any instance in the Cluster.

HNSW Index is the fastest vector search index that Oracle offers in 23ai. Thus, customers want to use HNSW Index for search queries, while also issuing DML modifications on relational or vector columns in the underlying table. Since DMLs may render the in-memory HNSW Index structures stale, special protocols are added in this project to guarantee transactionally consistent results for customers.

View Documentation

Vector Format Output for Feature Extraction Algorithm

Feature extraction algorithms produce a set of features that represent projections in a lower dimensional latent space. The output is typically numerical and dense. VECTOR type representation is the natural choice.

Feature extraction algorithms represent a principled approach to vectorizing relational data. The vectorized representation can be used for similarity search.

View Documentation

GoldenGate Replication of JSON-Relational Duality Views

This feature allows developers to use Oracle GoldenGate technology to replicate JSON-relational duality view data as JSON documents, instead of relational tables, from an Oracle Database to a target Oracle or non-Oracle database.

Replication of JSON data is an important feature for high availability, fail-over, and real-time migration from a non-Oracle database to Oracle Database.

Oracle GoldenGate Replication to non-Oracle databases such as MongoDB (a document database) or Redis (a NoSQL key/value store) is simple and performant with the ability to replicate JSON documents from JSON-relational duality views.

Developers need not write complex JSON and SQL transformations to shape data for relational and document-based updates, or pay the cost of reconstructing application objects on the target database.

View Documentation

JSON-Relational Duality Views: Hidden and Generated Fields

You can map duality view data to hidden fields, absent from the view's documents. A generated field can use the value of hidden fields.

A document supported by a duality view can be simpler if it need not have a field for every underlying column, in particular for columns needed only for calculating other field values.
 

View Documentation

JSON-Relational Duality Views: Add Fields With Calculated Values

Duality views generate JSON documents containing objects whose fields map to columns of relational tables. You can add object fields whose values are calculated automatically.

When the database automatically augments objects by adding calculated fields, such calculation need not be done by multiple, separate client applications. And field calculations can refer to stored data that is not exposed in the resulting objects - a feature similar to redaction.

View Documentation

JSON Collection Views

JSON collection views are special, read-only database views that expose JSON objects in a JSON-type column named DATA.

JSON collection views are conceptually close to JSON-relational duality views, but they have fewer restrictions because they are read only.

View Documentation

JSON Replication

The logical_replication_clause of the CREATE/ALTER TABLE statement is extended to allow disabling and enabling of partial JSON updating under supplemental logging.

Partial JSON updating makes replication more efficient because less data needs to be replicated or modified. The change can be replicated on the remote side, instead of sending all updated data. Using this functionality, you will experience better performance on the same hardware, thus reducing hardware costs.

View Documentation

JSON Search Index Path Subsetting

When creating a JSON search index you can specify the fields to include or exclude from indexing: path subsetting.

Path subsetting can reduce the size of a search index and improve its performance.

View Documentation

Replication Support for JSON Collection Tables

JSON Collection Tables can be enabled for logical replication using GoldenGate. Replication is supported to and from JSON Relational Duality Views as well to and from third-party products, such as MongoDB.

Replication is a basic database functionality that works between Oracle Databases. It is also used to facilitate online migration to Oracle Database from third-party databases, such as MongoDB.

View Documentation

CONSTRAINT_NOVALIDATE, a Data Pump Import TRANSFORM Parameter

The Oracle Data Pump TRANSFORM parameter option CONSTRAINT_NOVALIDATE enables you to set validation preferences. It has two options: Y or N. When set to Y, constraints are not validated during import.

Validating constraints during import that were valid on the source can be unnecessary and slow the migration process. Validation can be done after import.

View Documentation

Disable Statistics Gathering During Autonomous Database Import Operations

The DATA_OPTIONS=DISABLE_STATS_GATHERING parameter option disables statistics gathering during an import job.

Statistics gathering can have a performance impact on a large import operation. This parameter disables statistics gathering during an import job. The parameter is also supported for on-premises and user-managed cloud services import operations. Statistics gathering is not automatic in non-autonomous database environments, but this parameter can be useful. It controls whether Oracle Autonomous Database gathers environment statistics automatically for DML operations, such as data loading during an Oracle Data Pump Import job.

View Documentation

Enhancements to Oracle Data Redaction

This release includes many enhancements to Oracle Data Redaction, such as the optimization of existing capabilities and the removal of previous limitations.

These enhancements to Oracle Data Redaction improve the overall experience.

View Documentation

Parallel Index Creation Parameters for Data Pump Import

The Oracle Data Pump Import command-line mode ONESTEP_INDEX parameter optimizes index creation concurrency and balances it with overall job parallelism. It does this in conjunction with the INDEX_THRESHOLD parameter.

Index creation is an essential step in the migration process. Enabling parallelism for large index creation and balancing the needs of large index creation with overrall import job parallelism makes more efficient use of parallel resources assigned to a Data Pump import job. It makes the logical migration process more effective by substantially reducing the time to import. Finally, it removes the effort associated with the extra step of extracting  the large index definitions from the source dumpfiles and creating these large indexes outside the import job.

View Documentation

Sessionless Transactions

Managing a transaction requires the connection and session resources to be tied to the transaction throughout its lifecycle. Therefore, the session or connection can be released only after the transaction has ended. This often results in underutilization of sessions/connections. In Sessionless Transactions, after you start a transaction, you have the flexibility to suspend and resume the transaction during its lifecycle. The session or connection can be released back to the pool and can be reused by other transactions, therefore effectively being able to multiplex transactions and sessions/connections.

Sessionless Transactions provide ability for applications to suspend and resume transactions across sessions/connections (single instance or RAC) without the need for an external transaction manager, and without the application having to coordinate the commit and recovery protocols. The database manages transaction lifecycle, including commit and recovery. Application performance, and throughput, benefit from reduced commit latency since fewer client-server roundtrips are needed. Since external coordination is not required, using Sessionless Transactions results in vastly simplified mid-tier or app-tier infrastructure, and significantly decreases downtimes when compared with externally coordinating transactions (such as with XA).

View Documentation

XMLTYPE_STORAGE_CLAUSE, a Data Pump Import TRANSFORM Parameter

Transportable Binary XML simplifies the XML data storage and makes it easier to transport. It does not store the metadata used to encode or decode XML data in a central table.

XMLTYPE_STORAGE_CLAUSE now takes the options TRANSPORTABLE BINARY XML or  BINARY XML. Oracle recommends that you use the TRANSPORTABLE BINARY XML XMLType, the new and recommended storage type for Release 23ai to store data in a self-contained binary format. 

Use the BINARY XML (Non-Transportable) storage XMLType to store data in a post-parse, binary format designed specifically for XML data. Binary XML is compact, post-parse, XML schema-aware XML data. Binary XML is non-transportable and stores the metadata used to encode or decode XML data  efficiently in a central table.

Data Pump can export and import data of type XMLType regardless of the source database XMLType storage format (object-relational, binary XML or CLOB). Oracle Data Pump exports and imports XML data as Transportable Binary XML so the the source and target databases can use different XMLType storage models for the data.

View Documentation