Persistent Data-Guide Information as Part of a JSON Search Index

24.2 Persistent Data-Guide Information as Part of a JSON Search Index

JSON data-guide information can be saved persistently as part of the JSON search index infrastructure. This information is then updated automatically as new JSON content is added. You specify this optional behavior when you create a JSON search index.

You can use CREATE SEARCH INDEX with keywords FOR JSON to create a search index, a data guide, or both at the same time. The default behavior is to create a search index without data-guide support.

To create persistent data-guide information as part of a JSON search index you specify DATAGUIDE as ON in the PARAMETERS clause for CREATE SEARCH INDEX. You can enable data-guide support without enabling support for search, by also specifying SEARCH_ON NONE. Example 24-1 illustrates this.

You can use ALTER INDEX ... REBUILD to enable or disable data-guide support for an existing JSON search index. Example 24-2 illustrates this — it enables data-guide support for the search index of Example 30-24.

Note:

To create a data guide-enabled JSON search index, or to data guide-enable an existing JSON search index, you need database privilege CTXAPP and Oracle Database Release 12c (12.2.0.1) or later.

Note:

A data guide-enabled JSON search index can be built only on a column that is known to contain JSON data, which means that it is either of JSON data type or it has an is json check constraint. In the latter case, for the data-guide information in the index to be updated, the check constraint must be enabled.

If the check constraint becomes disabled for some reason then you must rebuild the data-guide information in the index and re-enable the check constraint, to resume automatic data-guide support updating, as follows:

ALTER INDEX index_name REBUILD ('dataguide off');
ALTER INDEX index_name REBUILD ('dataguide on');
ALTER TABLE table_name ENABLE CONSTRAINT is_json_check_constraint_name;

In particular, using SQL*Loader (sqlldr) disables is json check constraints.

When you enable persistent data-guide information it is part of the search index infrastructure, so it is always live: its content is automatically updated whenever the index is synchronized. Changes in the indexed data are reflected in the search index, including in its data-guide information, only after the index is synchronized.

In addition, update of data-guide information in a search index is always additive: none of it is ever deleted. This is another reason that the index often does not accurately reflect the data in its document set: deletions within the documents it applies to are not reflected in its data-guide information. If you need to ensure that such information accurately reflects the current data then you must drop the JSON search index and create it anew.

The persistent data-guide information in a search index can also include statistics, such as how frequently each JSON field is used in the document set. Statistics are present only if you explicitly gather them on the document set (gather them on the JSON search index, for example). They are not updated automatically — gather statistics anew if you want to be sure they are up to date. Example 24-3 gathers statistics on the JSON data indexed by JSON search index po_search_idx, which is created in Example 30-24.

Note:

When a local data guide-enabled JSON search index is created in a sharding environment, each individual shard contains the data-guide information for the JSON documents stored in that shard. For this reason, if you invoke data guide-related operations on the shard catalog database then they will have no effect.

Considerations for a Data Guide-Enabled Search Index on a Partitioned Table

The data-guide information in a data guide-enabled JSON search index that is local to a partitioned table is not partitioned. It is shared among all partitions.

Because the data-guide information in the index is only additive, dropping, merging, splitting, or truncating partitions has no impact on the index.

Exchanging a partitioned table with a table that is not partitioned updates the data-guide information in an index on the partitioned table, but any data guide-enabled index on the non-partitioned table must be rebuilt.

Avoid Persistent Data-Guide Information If Serializing Hash-Table Data

If you serialize Java hash tables or associative arrays (such as are found in JavaScript) as JSON objects, then avoid the use of persistent data-guide information.

The default hash-table serialization provided by popular libraries such as GSON and Jackson produces textual JSON documents with object field names that are taken from the hash-table key entries and with field values taken from the corresponding Java hash-table values. Serializing a single Java hash-table entry produces a new (unique) JSON field and value.

Persisted data-guide information reflects the shape of your data, and it is updated automatically as new JSON documents are inserted. Each hash-table key–value pair results in a separate entry in the JSON search index. Such serialization can thus greatly increase the size of the information maintained in the index. In addition to the large size, the many index updates affect performance negatively, making DML slow.

If you serialize a hash table or an associative array instead as a JSON array of objects, each of which includes a field derived from a hash-table key entry, then there are no such problems.

The default serialization of a hash table or associative array as a JSON object is indistinguishable from an object that has field names assigned by a developer. A JSON data guide cannot tell which (metadata-like) field names have been assigned by a developer and which (data-like) names might have been derived from a hash table or associative array. It treats all field names as essentially metadata, as if specified by a developer.

For example:

If you construct an application object using a hash table that has animalName as the hash key and sets of animal properties as values then the resulting default serialization is a single JSON object that has a separate field ("cat", "mouse",...) for each hash-table entry, with the field value being an object with the corresponding animal properties. This can be problematic in terms of data-guide size and performance because of the typically large number of fields ("cat", "mouse",...) derived from the hash key.
If you instead construct an application array of animal structures, each of which has a field animalName (with value "cat" or "mouse"...) then the resulting serialization is a JSON array of objects, each of which has the same field, animalName. The corresponding data guide has no size or performance problem.

Example 24-1 Enabling Persistent Support for a JSON Data Guide But Not For Search

CREATE SEARCH INDEX po_dg_only_idx
  ON j_purchaseorder (data) FOR JSON
    PARAMETERS ('DATAGUIDE ON SEARCH_ON NONE');

Example 24-2 Enabling JSON Data-Guide Support For an Existing JSON Search Index

ALTER INDEX po_search_idx REBUILD PARAMETERS ('DATAGUIDE ON');

Example 24-3 Gathering Statistics on JSON Data Using a JSON Search Index

EXEC DBMS_STATS.gather_index_stats(docuser, po_search_idx, NULL, 100);