Example: Using unnesting with a GROUP BY clause
Consider a TV streaming application. It streams various shows that are watched by customers across the globe. Every show has number of seasons and every season has multiple episodes. You need a persistent meta-data store which keeps track of the current activity of the customers using the TV streaming application. A customer is interested to know about the episodes they watched, the watch time per episode, the total number of seasons of the show they watched etc. The customer also wants the streaming application to start streaming from where they left off watching. The streaming application needs reports on which show is most popular among customers, how many minutes a show is being watched etc. These reports can be generated using UNNEST clause in queries.
To follow along with the examples, create the stream_acct
table and insert data as described in the Tables Used in the Examples section.
Example 6-101 Fetch the different shows watched by people in the US alone and the number of people watching them
SELECT $show.showId, count(*) as cnt FROM stream_acct $s,
unnest($s.acct_data.contentStreamed[] as $show)
WHERE $s.acct_data.country = "USA" GROUP BY $show.showId
ORDER BY count(*) DESC
Output:{"showId":15,"cnt":2}
{"showId":16,"cnt":2}
Example 6-102 For every show aired by the application, fetch the total watch time by all users
SELECT $show.showId, sum($show.seriesInfo.episodes.minWatched) AS total_time
FROM stream_acct $s, unnest($s.acct_data.contentStreamed[] AS $show)
GROUP BY $show.showId ORDER BY sum($show.seriesInfo.episodes.minWatched)
Output:{"showId":26,"total_time":225}
{"showId":16,"total_time":440}
{"showId":15,"total_time":642}
Note:
The unnest operator ( that is the keywordunnest
) can be omitted as it is a no-op operator. The use of the UNNEST clause is recommended when there is an index on the array(s) or map(s) that are being unnested. See Examples: Using Indexes for Query Optimization for more information.
SELECT $show.showId, sum($show.seriesInfo.episodes.minWatched) AS total_time
FROM stream_acct $s, $s.acct_data.contentStreamed[] AS $show
GROUP BY $show.showId ORDER BY sum($show.seriesInfo.episodes.minWatched)
Output:{"showId":26,"total_time":225}
{"showId":16,"total_time":440}
{"showId":15,"total_time":642}
Example 6-103 Fetch the total watch time of users per show and season
SELECT $show.showId, $seriesInfo.seasonNum,
sum($seriesInfo.episodes.minWatched) AS length
FROM stream_acct n,
unnest(n.acct_data.contentStreamed[] AS $show,
$show.seriesInfo[] as $seriesInfo)
GROUP BY $show.showId, $seriesInfo.seasonNum
ORDER BY sum($seriesInfo.episodes.minWatched)
Output:{"showId":26,"seasonNum":2,"length":80}
{"showId":26,"seasonNum":1,"length":145}
{"showId":16,"seasonNum":2,"length":190}
{"showId":16,"seasonNum":1,"length":250}
{"showId":15,"seasonNum":2,"length":295}
{"showId":15,"seasonNum":1,"length":347}
Example 6-104 Using unnest with a non-path expression
seq_distinct()
function, which eliminates duplicate values. SELECT $area, count(*) AS cnt
FROM Users u, seq_distinct(u.address.phones.area) AS $area
GROUP BY $area
Output:{"area":408,"cnt":1}
{"area":831,"cnt":1}
To know more about seq_disctinct
function, see Functions on Sequences.