22.2 COLUMNS Clause of SQL/JSON Function JSON_TABLE

The mandatory COLUMNS clause for SQL/JSON function json_table defines the columns of the virtual table that the function creates.

It consists of the keyword COLUMNS followed by the following entries, enclosed in parentheses. Other than the optional FOR ORDINALITY entry, each entry in the COLUMNS clause is either a regular column specification or a nested columns specification.

  • At most one entry in the COLUMNS clause can be a column name followed by the keywords FOR ORDINALITY, which specifies a column of generated row numbers (SQL data type NUMBER). These numbers start with one. For example:

    COLUMNS (linenum FOR ORDINALITY, ProductID)

    An array step in a row path expression can lead to any number of rows that match the path expression. In particular, the order of array-step indexes and ranges, multiple occurrences of an array index, and duplication of a specified position due to range overlaps produce one row for each position match. The ordinality row numbers reflect this.

  • A regular column specification consists of a column name followed by an optional data type for the column, which can be any SQL data type that can be used in the RETURNING clause of json_value, followed by an optional value clause and an optional PATH clause. The default data type is VARCHAR2(4000).

    The column data type can thus be any of these: BINARY_DOUBLE, BINARY_FLOAT, BOOLEAN, CHAR, CLOB, DATE (with optional keywords PRESERVE TIME or TRUNCATE TIME), DOUBLE PRECISION, FLOAT, INTEGER, NUMBER, INTERVAL YEAR TO MONTH, INTERVAL DAY TO SECOND, NCHAR, NCLOB, NVARCHAR2, RAWFoot 1, REAL, SDO_GEOMETRY, TIMESTAMP, TIMESTAMP WITH TIME ZONE, and VARCHAR2. You can also use a user-defined object type or a collection type.

    Data type SDO_GEOMETRY is used for Oracle Spatial and Graph data. In particular, this means that you can use json_table with GeoJSON data, which is a format for encoding geographic data in JSON.

    Oracle extends the SQL/JSON standard in the case when the returning data type for a column is VARCHAR2(N), by allowing optional keyword TRUNCATE immediately after the data type. When TRUNCATE is present and the value to return is wider than N, the value is truncated — only the first N characters are returned. If TRUNCATE is absent then this case is treated as an error, handled as usual by an error clause or the default error-handling behavior.

  • A nested columns specification consists of the keyword NESTED followed by an optional PATH keyword, a SQL/JSON row path expression, and then a COLUMNS clause. This COLUMNS clause specifies columns that represent nested data. The row path expression used here provides a refined context for the specified nested columns: each nested column path expression is relative to the row path expression. You can nest columns clauses to project values that are present in arrays at different levels to columns of the same row.

    A COLUMNS clause at any level (nested or not) has the same characteristics. In other words, the COLUMNS clause is defined recursively. For each level of nesting (that is, for each use of keyword NESTED), the nested COLUMNS clause is said to be the child of the COLUMNS clause within which it is nested, which is its parent. Two or more COLUMNS clauses that have the same parent clause are siblings.

    The virtual tables defined by parent and child COLUMNS clauses are joined using an outer join, with the parent being the outer table. The virtual columns defined by sibling COLUMNS clauses are joined using a union join.

    Example 22-1 and Example 22-9 illustrate the use of a nested columns clause.

The only thing required in a regular column specification is the column name. Defining the column projection in more detail, by specifying a scalar data type, value handling, or a target path, is optional.

  • The optional value clause specifies how to handle the data projected to the column: whether to handle it as would json_value, json_exists, or json_query. This value handling includes the return data type, return format (pretty or ASCII), wrapper, and error treatment.

    If you use keyword EXISTS then the projected data is handled as if by json_exists (regardless of the column data type).

    Otherwise:

    • For a column of data type JSON, the projected data is handled as if by json_query.

    • For a non-JSON type column (any type that can be used in a json_value RETURNING clause), the projected data is handled by default as if by json_value. But if you use keywords FORMAT JSON then it is handled as if by json_query. You typically use FORMAT JSON only when the projected data is a JSON object or array. (An error is raised if you use FORMAT JSON with a JSON type column.)

    For example, here the value of column FirstName is projected directly using json_value semantics, and the value of column Address is projected as a JSON string using json_query semantics:

    COLUMNS (FirstName, Address FORMAT JSON)

    If json_value semantics are used then the targeted data can alternatively be a JSON array of numbers instead of a JSON scalar value, if the RETURNING type is VECTOR. An error is raised for array input if the return type is not VECTOR or if the array has any non-number elements.

    json_query semantics imply that the projected JSON data is well-formed. If the column is a non-JSON type then this includes ensuring that non-ASCII characters in string values are escaped as needed. For example, a TAB character (CHARACTER TABULATION, U+0009) is escaped as \t. (For JSON type data, any such escaping is done when the JSON data is created, not when json_query is used.)

    When the column has json_query semantics:

    • If database initialization parameter compatible is at least 20 then you can use keywords DISALLOW SCALARS to affect the json_query behavior by excluding scalar JSON values.

    • You can override the default wrapping behavior by adding an explicit wrapper clause.

    You can override the default error handling for a given handler (json_exists, json_value, or json_query) by adding an explicit error clause appropriate for it.

  • The optional PATH clause specifies the portion of the row that is to be used as the column content. The column path expression following keyword PATH is matched against the context item provided by the virtual row. The column path expression must represent a relative path; it is relative to the path specified by the row path expression.

    If the PATH clause is not present then the behavior is the same as if it were present with a path of '$.<column-name>', where <column-name> is the column name. That is, the name of the object field that is targeted is taken implicitly as the column name.

    For purposes of specifying the targeted field only, the SQL identifier used for <column-name> is interpreted case-sensitively, even if it is not quoted. The SQL name of the column itself follows the usual rule: if it is enclosed in double quotation marks (") then the letter case used is significant; otherwise, it is not (it is treated as if uppercase).

    For example, these two COLUMNS clauses are equivalent. For SQL, case is significant only for column Comments (because it is quoted). The other two columns have case-insensitive names (that is, their names are treated case-insensitively), regardless of whether a PATH clause is used. In the first COLUMNS clause the first two columns are written with mixed case that matches the field names they target implicitly.

    COLUMNS(ProductId, Quantity NUMBER, "Comments")
    
    COLUMNS(productid   VARCHAR2(4000) PATH '$.ProductId',
            quantity    NUMBER         PATH '$.Quantity',
            "Comments"  VARCHAR2(4000) PATH '$.Comments')
    

    Example 22-1 presents equivalent queries that illustrate this.

    You can also use the dot notation in a PATH clause, as an alternative to a SQL/JSON path expression. Example 22-2 and Example 22-9 illustrate this.

In a column path-expression array step, the order of indexes and ranges, multiple occurrences of an array index, and duplication of a specified position due to range overlaps have the effect they would have for the particular semantics use for the column: json_exists, json_query, or json_value:

  • json_exists — All that counts is the set of specified positions, not how they are specified, including the order or number of times they are specified. All that is checked is the existence of a match for at least one specified position.

  • json_query — Each occurrence of a specified position is matched against the data, in order.

  • json_value — If only one position is specified then it is matched against the data. Otherwise, there is no match — by default (NULL ON ERROR) a SQL NULL value is returned.

A columns clause with json_value semantics also accepts the optional keyword combination TYPE (STRICT) following the PATH clause. This has the same meaning and behavior as when it is used in connection with the RETURNING clause of json_value. For example, these two queries are equivalent. Only PONumber fields whose value is numeric are considered (projected).

SELECT jt.ponumb
  FROM j_purchaseorder,
       json_table(data, '$'
         COLUMNS (ponumb NUMBER PATH '$.PONumber.numberOnly()')) jt
SELECT jt.ponumb
  FROM j_purchaseorder,
       json_table(data, '$'
         COLUMNS (ponumb NUMBER PATH '$.PONumber' TYPE (STRICT))) jt

See Also:



Footnote Legend

Footnote 1: You can use RAW as the return type only when the input data is of JSON data type.