Optimizing unnesting queries with the UNNEST clause
Note:
See Limitation for expression usage in the UNNEST clause for more details on the restrictions placed by the UNNEST clause. For SQL statements to create tables and load data for using unnest queries, see Example 5-3.Example 9-22 Fetch different shows aired in the US and the number of people watching them
- The query is specific to a country (US) and a specific show. So create an index on the
country
andshowid
fields of thestream_acct
table.CREATE INDEX idx_country_showid ON stream_acct( acct_data.country AS string, acct_data.contentStreamed[].showId AS integer) WITH UNIQUE KEYS PER ROW
Note:
The index must be created with the “unique keys per row property” in order for such indexes to be usable by queries that unnest the same array(s)/map(s) as the index. - The query to fetch different shows aired in the US and number of people watching it with the UNNEST clause.
Output:SELECT $show.showId, count(*) as cnt FROM stream_acct $s, unnest($s.acct_data.contentStreamed[] as $show) WHERE $s.acct_data.country = "USA" GROUP BY $show.showId ORDER BY count(*) desc
The above query will use the{"showId":15,"cnt":2} {"showId":16,"cnt":2}
idx_country_showid
index. The country condition will be pushed to the index, the group-by will be index-based, and the index is a covering one for this query. An index that contains all required information to resolve the query is known as a Covering Index – it completely covers the query. Covering Index includes all the columns, the query refers to in the SELECT, JOIN, and WHERE clauses. If the UNNEST clause is not used, the index will not be considered.
Example 9-23 For every show aired by the application, the total watch time for all users
SELECT $show.showId, sum($show.seriesInfo.episodes.minWatched)
AS total_time FROM stream_acct $s,
unnest($s.acct_data.contentStreamed[] AS $show)
GROUP BY $show.showId
ORDER BY sum($show.seriesInfo.episodes.minWatched)
Let us examine the effect of creating an index on acct_data.contentStreamed[].showId
as the data is been grouped based on showId
.
Despite the use of the UNNEST clause, this query cannot use the idx_showid
index. This is because of the argument to the sum()
function. The idx_showid
index contains just the showId
(and the primary key). So, the expression $show.seriesInfo.episodes.minWatched
cannot be evaluated from the index. There are two ways to optimize this query.
Option 1: Create an additional index:
showId
and minWatched
fields as both are used in the query.CREATE INDEX idx_showid_minWatched ON
stream_acct(acct_data.contentStreamed[].showId AS integer,
acct_data.contentStreamed[].seriesInfo[].episodes[].minWatched AS integer,
acct_data.contentStreamed[].seriesInfo[].episodes[].episodeID as integer)
WITH UNIQUE KEYS PER ROW
Note:
TheepisodeID
must be added in this index, as the last index path, in order for the "unique keys per row" constraint to be satisfied. This index will be used by the query, as a covering index.
Option 2: Avoid the cost of an additional index:
idx_showid
index.The rewritten query below uses the idx_showid
index, but the index is not covering.SELECT $show.showId, sum($s.acct_data.contentStreamed[$element.showId = $show.showId].
seriesInfo.episodes.minWatched) AS total_time
FROM stream_acct $s, unnest($s.acct_data.contentStreamed[] AS $show) GROUP BY $show.showId
ORDER BY sum($s.acct_data.contentStreamed[$element.showId = $show.showId].seriesInfo.episodes.minWatched)
Output:
{"showId":26,"total_time":225}
{"showId":16,"total_time":440}
{"showId":15,"total_time":642}
Example 9-24 The total watch time of users per show and season
SELECT $show.showId, $seriesInfo.seasonNum,
sum($seriesInfo.episodes.minWatched) AS length
FROM stream_acct n,
unnest(n.acct_data.contentStreamed[] AS $show, $show.seriesInfo[] as $seriesInfo)
GROUP BY $show.showId, $seriesInfo.seasonNum
ORDER BY sum($seriesInfo.episodes.minWatched)
Output:
{"showId":26,"seasonNum":2,"length":80}
{"showId":26,"seasonNum":1,"length":145}
{"showId":16,"seasonNum":2,"length":190}
{"showId":16,"seasonNum":1,"length":250}
{"showId":15,"seasonNum":2,"length":295}
{"showId":15,"seasonNum":1,"length":347}
idx_showid_seasonNum_minWatched
is a multi key index. The episodeID
must be added in this index, as the last index path, in order for the "unique keys per row" constraint to be satisfied. CREATE INDEX idx_showid_seasonNum_minWatched ON
stream_acct(acct_data.contentStreamed[].showId as integer,
acct_data.contentStreamed[].seriesInfo[].seasonNum as integer,
acct_data.contentStreamed[].seriesInfo[].episodes[].minWatched as integer,
acct_data.contentStreamed[].seriesInfo[].episodes[].episodeID as integer)
WITH UNIQUE KEYS PER ROW
Note:
An index is called a multikey index if for each row of data in the table, there are multiple entries created in the index. In a multikey index, there is at least one index path that uses .keys(), .values(), or [] steps. Any such index path will be called a multikey index path.If you want your query to use an index on the array(s)/maps() that it is unnesting, each path expression in the UNNEST clause must match with the multikey_path_prefix of an index path in the index. As shown in the syntax for the CREATE INDEX Statement the multikey_path_prefix is the part of an index path up to and including the last multikey step. For example, the multikey_path_prefix of the first index path should match the first path expression in the UNNEST clause and so on. If this is not the case, the index will not be used.
In the above query the expression n.value.contentStreamed[]
matches the multikey_path_prefix of the first index path in idx_showid_seasonNum_minWatched
, and the expression $show.seriesInfo[]
matches the multikey_path_prefix of the second index path, after the $show
variable is replaced with its domain expression. So this index will be used in the query.