30.8 Creating Multivalue Function-Based Indexes for JSON_EXISTS

For JSON data that is stored as JSON data type you can use a multivalue function-based index for SQL/JSON condition json_exists. Such an index targets scalar JSON values, either individually or within a JSON array.

The main use of a multivalue index is to index scalar values within arrays. This includes scalar array elements, but also scalar field values of object array elements.

A multivalue index can also index a single scalar value, but for queries that target a single value it is generally more performant to use a B-tree or bitmap index.

In a query, you use json_exists in the WHERE clause of a SELECT statement. Condition json_exists returns true if the data it targets matches the SQL/JSON path expression (or equivalent simple dot-notation syntax) in the query. Otherwise it returns false. It is common for the path expression to include a filter expression — matching then requires that the targeted data satisfy that filter.

You create a multivalue index using CREATE INDEX with keyword MULTIVALUE, and using either the syntax of SQL/JSON function json_table or simple dot-notation that you use in queries to specify the path to the indexed data. (However, you cannot use a SQL NESTED clause in place of json_table — a compile-time error is raised if you do that.)

You can create a composite function-based index, to index more than one virtual column, that is, more than one JSON field. A composite index acts like a set of function-based indexes. When used to query, you use function json_table to project specified JSON field values as virtual columns of SQL scalar values. Similarly, when used to define an index, the field values that json_table specifies are indexed as a composite function-based index.

When using json_table syntax to create a multivalue index you must use these error-handling clauses: ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH; otherwise, a query compile-time error is raised. When using simple dot-notation syntax without json_table, the behavior of these clauses is provided implicitly.

When using json_table syntax you can use a FOR ORDINALITY clause, to enable use of the index for queries that target specific array positions. (See COLUMNS Clause of SQL/JSON Function JSON_TABLE.)

For a multivalue index to be picked up by a query, the index must specify the SQL type of the data to be indexed, and the SQL type for the query result must match the type specified by the index.

If you create a non-composite multivalue index, that is, without using json_table syntax, then the index specification must include a data-type conversion item method (other than binary() and dateWithTime()), to indicate the SQL data type. See SQL/JSON Path Expression Item Methods for information about the data-type conversion item methods.

If the index uses an item method with "only" in its name then only queries that use that same item method can pick up the index. Otherwise (with a non-"only" method or with no method), any query that targets a scalar value (possibly as an array element) that can be converted to the type indicated by the item method can pick up the index.

For example, a multivalue index that uses item method numberOnly() can only be picked up for a query that also uses numberOnly(). But an index that uses number(), or that uses no item method, can be picked up for a query that matches any scalar (such as the string "3.14") that can be converted to a number.

If you create a composite multivalue index then the json_table virtual column type specifies the SQL type to use. This means that queries of data that can be converted to the specified SQL type can pick up the index.

However, just as in the non-composite index case, you can use a data-type conversion item method with "only" in its name, to override (further constrain) the specified column type. You use the item method in the column path expression.

For example, if the column type is specified as NUMBER then queries with matching data (such as the string "3.14") that can be converted to a number can pick up the index. But if the column path expression uses item method numberOnly() then only queries that also use numberOnly() can pick up the index.

You can create more than one multivalue index for a given target. For example, you can create one index for a field month that uses item method number() and another for the same field that uses item method string().

The following are not allowed, as ways to create a multivalue index:

  • You cannot specify sibling nested arrays in the json_table expression used to create a composite multivalue index. An error is raised if you try. You can index multiple arrays, but they cannot be siblings, that is, they cannot have the same parent field.

  • Using a SQL NESTED clause (see SQL NESTED Clause Instead of JSON_TABLE).

A type-error mismatch between the type of a scalar JSON value and the corresponding scalar SQL data type of a json_table virtual column can be because of type incompatibility, as put forth in Table 18-2, or because the SQL data type is too constraining — too small to store the data.

Error-handling ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH returns SQL NULL for the first kind of mismatch, but it raises an error for the second kind. For example, type incompatibility is tolerated when creating an index with SQL type NUMBER for JSON string data, but an error is raised if you try to create an index using SQL type VARCHAR(2) for data that has a JSON string value of "hello", because the data has more than two characters.

Example 30-13 Table PARTS_TAB, for Multivalue Index Examples

Table parts_tab, with JSON data type column jparts, is used in multivalue index examples here. The JSON data includes field subparts whose value is an array with scalar elements.

CREATE TABLE parts_tab (id NUMBER, jparts JSON);

INSERT INTO parts_tab VALUES
  (1, '{"parts" : [{"partno"   : 3, "subparts" : [510, 580, 520]},
                   {"partno"   : 4, "subparts" : 730}]}');

INSERT INTO parts_tab VALUES
  (2, '{"parts" : [{"partno"   : 7, "subparts" : [410, 420, 410]},
                   {"partno"   : 4, "subparts" : [710, 730, 730]}]}');

Example 30-14 Creating a Multivalue Index for JSON_EXISTS

The multivalue index created here indexes the value of field subparts. The table alias (t in this case) is required when using simple dot notation syntax.

If the subparts value targeted by a query is an array then the index can be picked up for any array elements that are numbers. If the value is a scalar then the index can be picked up if the scalar is a number.

Given the data in table parts_tab, a subparts field in each of the objects of array parts in the first row (which has id 1) is indexed: the field in the first object because its array value has elements that are numbers (510, 580, and 520 ) the field in the second object because its value is a number (730).

If item method number() were used in the index definition, instead of numberOnly(), then non-number scalar values (such as the string "730") that can be converted to numbers would also be indexed.

CREATE MULTIVALUE INDEX mvi ON parts_tab t
  (t.jparts.parts.subparts.numberOnly());

Example 30-15 Creating a Composite Multivalue Index for JSON_EXISTS

This example creates a composite multivalue index that targets both field partno and field subparts. The composite index acts like a set of two function-based indexes that target those two fields.

The query uses json_table syntax with a SQL/JSON path expression for the row pattern, $.parts[*]. As must always be the case for multivalue index creation using json_table, the error handling is specified as ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH.

Column PARTNUM is given SQL data type NUMBER(10) here, which means that, for the index to be used for a query that targets field partno, the value of that field must be convertible to that data type.

  • If type conversion is impossible because the types are generally incompatible, as put forth in Table 18-2, then the NULL ON MISMATCH error handler causes SQL NULL to be returned. An example of this would be a partno string value of "hello" for the SQL partNum column of type NUMBER(10).

  • If, on the other hand, the SQL data type storage is too constraining then an error is raised — the index is not created. An example of this would be a partno string with more than 10 characters, such as "1234567890123".

CREATE MULTIVALUE INDEX cmvi_1 ON parts_tab
  (json_table(jparts, '$.parts[*]'
     ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH
     COLUMNS (partNum NUMBER(10) PATH '$.partno',
       NESTED PATH '$.subparts[*]'
         COLUMNS (subpartNum NUMBER(20) PATH '$'))));

Example 30-16 Creating a Composite Multivalue Index That Can Target Array Positions

The code in this example is like that in Example 30-15, except that it also specifies virtual column SEQ for ordinality. That means that values in the column just before it, SUBPARTNUM, can be accessed by way of their (one-based) positions in array subparts. (The SQL data type of a FOR ORDINALITY column is always NUMBER.)

As always, at most one entry in a COLUMNS clause can be a column name followed by FOR ORDINALITY, which specifies a column of generated row numbers (SQL data type NUMBER), starting with one. Otherwise, an error is raised when creating the index.

In addition to that general rule for json_table syntax:

  • When json_table is used to create a multivalue index, the FOR ORDINALITY column must be the last column of json_table. (This is not required when json_table is used in queries; it applies only to index creation.)

  • In order for a multivalue index created using json_table to be picked up for a given query, the query must apply a filter expression to the JSON field corresponding to the first virtual column of the json_table expression.

In order for a query that targets array elements by their position to pick up a multivalue index for array positions, the index column for those array elements must be the one immediately before the FOR ORDINALITY column

(The code here uses simple dot notation for the row pattern; if it instead used a SQL/JSON path expression for the row pattern, the rest of the code would be the same.)

CREATE MULTIVALUE INDEX cmvi_2 ON parts_tab t
  (t.jparts.parts[*]
  ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH
  COLUMNS (partNum NUMBER(10) PATH '$.partno',
    NESTED PATH subparts[*]
      COLUMNS (subpartNum NUMBER(20) PATH '$',
               seq FOR ORDINALITY))));