timestamp_bucket function

The timestamp_bucket function rounds the given timestamp value to the beginning of the specified interval (bucket). The interval starts at a specified origin on the timeline.

You can use this function for aggregating time series data to a desired time interval, known as periodicity. In certain cases, it is desirable to place all your time series data into equidistant buckets of given periodicity, with each bucket representing the same amount of time.

Syntax:

TIMESTAMP timestamp_bucket(<timestamp>[, interval [,origin ])
Semantics:
  • timestamp: The timestamp argument takes a TIMESTAMP value or a value that can be cast to TIMESTAMP type.
  • interval: The interval argument is optional and a STRING data type. The interval is specified as <n> unit.

    where,

    n specifies the value of the interval. The n must be > 0

    unit defines the interval component. The function supports WEEK, DAY, HOUR, MINUTE, and SECOND in either singular or plural format.

    For example, "5 MINUTE" or "5 MINUTES".

    Note:

    The units are not case-sensitive.
  • origin: The origin argument represents the starting point of buckets on the timeline. This argument is optional and takes a TIMESTAMP value. The origin can be of any data type that can be cast to TIMESTAMP type. If not specified, Unix epoch 1970-01-01 is the default value.

    Note:

    The function also rounds the input timestamps that are lesser than the origin to the beginning of the specified interval. That is, you can supply an origin with a future timestamp value as compared to the input timestamp value on the timeline.
  • Return Value: TIMESTAMP(9)
    The function returns NULL in the following cases:
    • If any of the arguments are set to NULL.
    • If the input timestamp is not castable to TIMESTAMP type.

Example 12-16 Fetch the statistics of the number of passengers departing from the IST airport every 12 hrs with buckets starting from January 1st, 2019. Consider data only for the month of February 2019

SELECT $t AS DATE,
count($t) AS FLIGHTCOUNT 
FROM BaggageInfo bag, bag.bagInfo[0].flightLegs[] $f, 
timestamp_bucket($f.flightDate, '12 HOURS', '2019-01-01T00') $t
WHERE $f.fltRouteSrc =any "IST" AND timestamp_floor($f.flightDate, 'MONTH') = '2019-02-01T00:00:00Z'
GROUP BY $t
ORDER BY $t

Explanation: To consider passengers traveling in February 2019, use the timestamp_floor function and round down the flightDate to the beginning of the month. Compare the result with the string "2019-02-01T00:00:00Z". This example supplies the date in an ISO-8601 formatted string, which gets implicitly CAST into a TIMESTAMP value.

To include the transit flights from the IST airport, use the array constructor [ ] to indicate that the flightLegs is an array and consider each fltRouteSrc array element in the search.

Use the timsestamp_bucket function on the flightDate fields with interval as 12 hours and origin as 1st of January 2019.

Output:
{"DATE":"2019-02-02T12:00:00.000000000Z","FLIGHTCOUNT":1}
{"DATE":"2019-02-04T00:00:00.000000000Z","FLIGHTCOUNT":1}
{"DATE":"2019-02-04T12:00:00.000000000Z","FLIGHTCOUNT":2}
{"DATE":"2019-02-07T12:00:00.000000000Z","FLIGHTCOUNT":1}
{"DATE":"2019-02-11T12:00:00.000000000Z","FLIGHTCOUNT":1}
{"DATE":"2019-02-12T00:00:00.000000000Z","FLIGHTCOUNT":2}
{"DATE":"2019-02-12T12:00:00.000000000Z","FLIGHTCOUNT":1}

Example 12-17 From the Streaming media service data, fetch the statistics of number of accounts that expire each week with buckets starting from noon of January 1st, 2023. Consider data from 1st of December 2023

SELECT array_collect(s.account_expiry) as ACCOUNT_EXPIRY,
$t AS WEEK,
count($t) AS ACCOUNTS
FROM stream_acct s,
timestamp_bucket(s.account_expiry, '1 week', '2023-01-01T12') $t
WHERE s.account_expiry >= '2023-12-01'
GROUP BY $t
ORDER BY $t;

Explanation: Use the timsestamp_bucket function on the account_expiry fields with interval as 1 week and origin as 1st of January 12 PM. The function calculates a periodicity of a week from 1st of January 2023, 12 PM and rounds the account_expiry values to the beginning of the corresponding week bucket.

Notice that you use the array_collect function to display the account_expiry fields. You can reference only grouping expressions, aggregate functions, or external variables when the SELECT expression includes a grouping function.

Output:
{"ACCOUNT_EXPIRY":["2023-12-18T00:00:00.000000000Z"],"WEEK":"2023-12-17T12:00:00.000000000Z","ACCOUNTS":1}
{"ACCOUNT_EXPIRY":["2023-12-31T00:00:00.000000000Z"],"WEEK":"2023-12-24T12:00:00.000000000Z","ACCOUNTS":1}
{"ACCOUNT_EXPIRY":["2024-03-18T00:00:00.000000000Z"],"WEEK":"2024-03-17T12:00:00.000000000Z","ACCOUNTS":1}