timestamp_bucket function
The timestamp_bucket
function rounds the given timestamp value to the beginning of the specified interval (bucket). The interval starts at a specified origin on the timeline.
You can use this function for aggregating time series data to a desired time interval, known as periodicity. In certain cases, it is desirable to place all your time series data into equidistant buckets of given periodicity, with each bucket representing the same amount of time.
Syntax:
TIMESTAMP timestamp_bucket(<timestamp>[, interval [,origin ])
- timestamp: The
timestamp
argument takes a TIMESTAMP value or a value that can be cast to TIMESTAMP type. - interval: The
interval
argument is optional and a STRING data type. Theinterval
is specified as<n> unit
.where,
n
specifies the value of the interval. Then
must be > 0unit
defines the interval component. The function supports WEEK, DAY, HOUR, MINUTE, and SECOND in either singular or plural format.For example, "5 MINUTE" or "5 MINUTES".
Note:
The units are not case-sensitive. - origin: The
origin
argument represents the starting point of buckets on the timeline. This argument is optional and takes a TIMESTAMP value. The origin can be of any data type that can be cast to TIMESTAMP type. If not specified, Unix epoch 1970-01-01 is the default value.Note:
The function also rounds the input timestamps that are lesser than theorigin
to the beginning of the specified interval. That is, you can supply anorigin
with a future timestamp value as compared to the input timestamp value on the timeline. - Return Value: TIMESTAMP(9)
The function returns NULL in the following cases:
- If any of the arguments are set to NULL.
- If the input
timestamp
is not castable to TIMESTAMP type.
Example 12-20 Fetch the statistics of the number of passengers departing from the IST airport every 12 hrs with buckets starting from January 1st, 2019. Consider data only for the month of February 2019
SELECT $t AS DATE,
count($t) AS FLIGHTCOUNT
FROM BaggageInfo bag, bag.bagInfo[0].flightLegs[] $f,
timestamp_bucket($f.flightDate, '12 HOURS', '2019-01-01T00') $t
WHERE $f.fltRouteSrc =any "IST" AND timestamp_floor($f.flightDate, 'MONTH') = '2019-02-01T00:00:00Z'
GROUP BY $t
ORDER BY $t
Explanation: To consider passengers traveling in February 2019, use the timestamp_floor function and round down the flightDate
to the beginning of the month. Compare the result with the string "2019-02-01T00:00:00Z". This example supplies the date in an ISO-8601 formatted string, which gets implicitly CAST into a TIMESTAMP value.
To include the transit flights from the IST airport, use the array constructor [ ] to indicate that the flightLegs
is an array and consider each fltRouteSrc
array element in the search.
Use the timsestamp_bucket
function on the flightDate
fields with interval as 12 hours and origin as 1st of January 2019.
{"DATE":"2019-02-02T12:00:00.000000000Z","FLIGHTCOUNT":1}
{"DATE":"2019-02-04T00:00:00.000000000Z","FLIGHTCOUNT":1}
{"DATE":"2019-02-04T12:00:00.000000000Z","FLIGHTCOUNT":2}
{"DATE":"2019-02-07T12:00:00.000000000Z","FLIGHTCOUNT":1}
{"DATE":"2019-02-11T12:00:00.000000000Z","FLIGHTCOUNT":1}
{"DATE":"2019-02-12T00:00:00.000000000Z","FLIGHTCOUNT":2}
{"DATE":"2019-02-12T12:00:00.000000000Z","FLIGHTCOUNT":1}
Example 12-21 From the Streaming media service data, fetch the statistics of number of accounts that expire each week with buckets starting from noon of January 1st, 2023. Consider data from 1st of December 2023
SELECT array_collect(s.account_expiry) as ACCOUNT_EXPIRY,
$t AS WEEK,
count($t) AS ACCOUNTS
FROM stream_acct s,
timestamp_bucket(s.account_expiry, '1 week', '2023-01-01T12') $t
WHERE s.account_expiry >= '2023-12-01'
GROUP BY $t
ORDER BY $t;
Explanation: Use the timsestamp_bucket
function on the account_expiry
fields with interval
as 1 week and origin as 1st of January 12 PM. The function calculates a periodicity of a week from 1st of January 2023, 12 PM and rounds the account_expiry
values to the beginning of the corresponding week bucket.
Notice that you use the array_collect function to display the account_expiry
fields. You can reference only grouping expressions, aggregate functions, or external variables when the SELECT expression includes a grouping function.
{"ACCOUNT_EXPIRY":["2023-12-18T00:00:00.000000000Z"],"WEEK":"2023-12-17T12:00:00.000000000Z","ACCOUNTS":1}
{"ACCOUNT_EXPIRY":["2023-12-31T00:00:00.000000000Z"],"WEEK":"2023-12-24T12:00:00.000000000Z","ACCOUNTS":1}
{"ACCOUNT_EXPIRY":["2024-03-18T00:00:00.000000000Z"],"WEEK":"2024-03-17T12:00:00.000000000Z","ACCOUNTS":1}