Example: Using unnesting with a GROUP BY clause
Consider a TV streaming application. It streams various shows that are watched by customers across the globe. Every show has number of seasons and every season has multiple episodes. You need a persistent meta-data store which keeps track of the current activity of the customers using the TV streaming application. A customer is interested to know about the episodes they watched, the watch time per episode, the total number of seasons of the show they watched etc. The customer also wants the streaming application to start streaming from where they left off watching. The streaming application needs reports on which show is most popular among customers, how many minutes a show is being watched etc. These reports can be generated using UNNEST clause in queries.
Create table and Load data for the TV streaming application
Download the script acctstream_loaddata.sql and run it as shown below. This script creates the table used in the example and loads data into the table.
java -jar lib/kvstore.jar kvlite -secure-config disable
java -jar lib/sql.jar -helper-hosts localhost:5000 -store kvstore
load
command, run the
script.load -file acctstream_loaddata.sql
SELECT $show.showId, count(*) as cnt FROM stream_acct $s,
unnest($s.acct_data.contentStreamed[] as $show)
WHERE $s.acct_data.country = "USA" GROUP BY $show.showId
ORDER BY count(*) DESC
Output:{"showId":15,"cnt":2}
{"showId":16,"cnt":2}
SELECT $show.showId, sum($show.seriesInfo.episodes.minWatched) AS total_time
FROM stream_acct $s, unnest($s.acct_data.contentStreamed[] AS $show)
GROUP BY $show.showId ORDER BY sum($show.seriesInfo.episodes.minWatched)
Output:{"showId":26,"total_time":225}
{"showId":16,"total_time":440}
{"showId":15,"total_time":642}
Note:
The unnest operator ( that is the keywordunnest
) can be omitted
as it is a no-op operator. The use of the UNNEST clause is recommended when there is
an index on the array(s) or map(s) that are being unnested. See Examples: Using Indexes for Query Optimization for more information.
SELECT $show.showId, sum($show.seriesInfo.episodes.minWatched) AS total_time
FROM stream_acct $s, $s.acct_data.contentStreamed[] AS $show
GROUP BY $show.showId ORDER BY sum($show.seriesInfo.episodes.minWatched)
Output:{"showId":26,"total_time":225}
{"showId":16,"total_time":440}
{"showId":15,"total_time":642}
SELECT $show.showId, $seriesInfo.seasonNum,
sum($seriesInfo.episodes.minWatched) AS length
FROM stream_acct n,
unnest(n.acct_data.contentStreamed[] AS $show,
$show.seriesInfo[] as $seriesInfo)
GROUP BY $show.showId, $seriesInfo.seasonNum
ORDER BY sum($seriesInfo.episodes.minWatched)
Output:{"showId":26,"seasonNum":2,"length":80}
{"showId":26,"seasonNum":1,"length":145}
{"showId":16,"seasonNum":2,"length":190}
{"showId":16,"seasonNum":1,"length":250}
{"showId":15,"seasonNum":2,"length":295}
{"showId":15,"seasonNum":1,"length":347}
Example 4: Using unnest with a non-path expression
seq_distinct()
function, which eliminates duplicate values.
SELECT $area, count(*) AS cnt
FROM Users u, seq_distinct(u.address.phones.area) AS $area
GROUP BY $area
Output:{"area":408,"cnt":1}
{"area":831,"cnt":1}
To know more about seq_disctinct
function, see Functions on Sequences.