5.1.9 Prepare Time Series Data
OML4R provides you with the ability to perform many data preparation operations on time series data, such as filtering, ordering, and transforming the data.
OML4R maps R data types to SQL data types, which allows you to create OML4R objects and perform data preparation operations in database memory. The following examples demonstrate some operations on time series data.
Example 5-22 Aggregating Date and Time Data
This example illustrates some of the statistical aggregation functions. For a data set, the example first generates on the local client a sequence of five hundred dates spread evenly throughout 2001. It then introduces a random difftime
and a vector of random normal values. The example then uses the ore.push
function to create MYDATA
, an in-database version of the data. The example calls the class
function to show that MYDATA
is an ore.frame
object and that the datetime
column is of class ore.datetime
. The example displays the first three rows of the generated data. It then uses the statistical aggregation operations of min
, max
, range
, median
, and quantile
on the datetime
column of MYDATA
.
N <- 500 mydata <- data.frame(datetime = seq(as.POSIXct("2001/01/01"), as.POSIXct("2001/12/31"), length.out = N), difftime = as.difftime(runif(N), units = "mins"), x = rnorm(N)) MYDATA <- ore.push(mydata) class(MYDATA) class(MYDATA$datetime) head(MYDATA,3) # statistical aggregations min(MYDATA$datetime) max(MYDATA$datetime) range(MYDATA$datetime) quantile(MYDATA$datetime, probs = c(0, 0.05, 0.10))
Listing for This Example
R> N <- 500
R> mydata <- data.frame(datetime =
+ seq(as.POSIXct("2001/01/01"),
+ as.POSIXct("2001/12/31"),
+ length.out = N),
+ difftime = as.difftime(runif(N),
+ units = "mins"),
+ x = rnorm(N))
R> MYDATA <- ore.push(mydata)
R> class(MYDATA)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
R> class(MYDATA$datetime)
[1] "ore.datetime"
attr(,"package")
[1] "OREbase"
R> head(MYDATA,3)
datetime difftime x
1 2001-01-01 00:00:00 16.436782 secs 0.68439244
2 2001-01-01 17:30:25 8.711562 secs 1.38481435
3 2001-01-02 11:00:50 1.366927 secs -0.00927078
R> # statistical aggregations
R> min(MYDATA$datetime)
[1] "2001-01-01 CST"
R> max(MYDATA$datetime)
[1] "2001-12-31 CST"
R> range(MYDATA$datetime)
[1] "2001-01-01 CST" "2001-12-31 CST"
R> quantile(MYDATA$datetime,
+ probs = c(0, 0.05, 0.10))
0% 5% 10%
"2001-01-01 00:00:00 CST" "2001-01-19 04:48:00 CST" "2001-02-06 09:36:00 CST"
Example 5-23 Using Date and Time Arithmetic
This example creates a one day shift by taking the datetime
column of the MYDATA
ore.frame
object created in the previous example and adding a difftime
of one day. The result is day1Shift
, which the example shows is of class ore.datetime
. The example displays the first three elements of the datetime
column of MYDATA
and those of day1Shift
. The first element of day1Shift
is January 2, 2001.
This example also computes lag differences using the overloaded diff
function. The difference between the dates is all the same because the 500 dates in MYDATA
are evenly distributed throughout 2001.
day1Shift <- MYDATA$datetime + as.difftime(1, units = "days") class(day1Shift) head(MYDATA$datetime,3) head(day1Shift,3) lag1Diff <- diff(MYDATA$datetime) class(lag1Diff) head(lag1Diff,3)
Listing for This Example
R> day1Shift <- MYDATA$datetime + as.difftime(1, units = "days")
R> class(day1Shift)
[1] "ore.datetime"
attr(,"package")
[1] "OREbase"
R> head(MYDATA$datetime,3)
[1] "2001-01-01 00:00:00 CST" "2001-01-01 17:30:25 CST" "2001-01-02 11:00:50 CST"
R> head(day1Shift,3)
[1] "2001-01-02 00:00:00 CST" "2001-01-02 17:30:25 CST" "2001-01-03 11:00:50 CST"
R> lag1Diff <- diff(MYDATA$datetime)
R> class(lag1Diff)
[1] "ore.difftime"
attr(,"package")
[1] "OREbase"
R> head(lag1Diff,3)
Time differences in secs
[1] 63025.25 63025.25 63025.25
Example 5-24 Comparing Dates and Times
This example demonstrates date and time comparisons. The example uses the datetime
column of the MYDATA
ore.frame
object created in the first example. This example selects the elements of MYDATA
that have a date earlier than April 1, 2001. The resulting isQ1
is of class ore.logical
and for the first three entries the result is TRUE
. The example finds out how many dates matching isQ1
are in March. It then sums the logical vector and displays the result, which is that 43 rows are in March. The example next filters rows based on dates that are the end of the year, after December 27. The result is eoySubset
, which is an ore.frame
object. The example displays the first three rows returned in eoySubset
.
isQ1 <- MYDATA$datetime < as.Date("2001/04/01") class(isQ1) head(isQ1,3) isMarch <- isQ1 & MYDATA$datetime > as.Date("2001/03/01") class(isMarch) head(isMarch,3) sum(isMarch) eoySubset <- MYDATA[MYDATA$datetime > as.Date("2001/12/27"), ] class(eoySubset) head(eoySubset,3)
Listing for This Example
R> isQ1 <- MYDATA$datetime < as.Date("2001/04/01")
R> class(isQ1)
[1] "ore.logical"
attr(,"package")
[1] "OREbase"
R> head(isQ1,3)
[1] TRUE TRUE TRUE
R> isMarch <- isQ1 & MYDATA$datetime > as.Date("2001/03/01")
R> class(isMarch)
[1] "ore.logical"
attr(,"package")
[1] "OREbase"
R> head(isMarch,3)
[1] FALSE FALSE FALSE
R> sum(isMarch)
[1] 43
R> eoySubset <- MYDATA[MYDATA$datetime > as.Date("2001/12/27"), ]
R> class(eoySubset)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
R> head(eoySubset,3)
datetime difftime x
495 2001-12-27 08:27:53 55.76474 secs -0.2740492
496 2001-12-28 01:58:18 15.42946 secs -1.4547270
497 2001-12-28 19:28:44 28.62195 secs 0.2929171
Example 5-25 Using Date and Time Accessors
OML4R has accessor functions that you can use to extract various components from datetime
objects, such as year, month, day of the month, hour, minute, and second. This example demonstrates the use of these functions. The example uses the datetime
column of the MYDATA
ore.frame
object created in the first example.
This example gets the year elements of the datetime
column. The invocation of the unique
function for year
displays 2001
because it is the only year value in the column. However, for objects that have a range of values, as for example, ore.mday
, the range
function returns the day of the month. The result contains a vector with values that range from 1 through 31. Invoking the range function succinctly reports the range of values, as demonstrated for the other accessor functions.
year <- ore.year(MYDATA$datetime) unique(year) month <- ore.month(MYDATA$datetime) range(month) dayOfMonth <- ore.mday(MYDATA$datetime) range(dayOfMonth) hour <- ore.hour(MYDATA$datetime) range(hour) minute <- ore.minute(MYDATA$datetime) range(minute) second <- ore.second(MYDATA$datetime) range(second)
Listing for This Example
R> year <- ore.year(MYDATA$datetime)
R> unique(year)
[1] 2001
R> month <- ore.month(MYDATA$datetime)
R> range(month)
[1] 1 12
R> dayOfMonth <- ore.mday(MYDATA$datetime)
R> range(dayOfMonth)
[1] 1 31
R> hour <- ore.hour(MYDATA$datetime)
R> range(hour)
[1] 0 23
R> minute <- ore.minute(MYDATA$datetime)
R> range(minute)
[1] 0 59
R> second <- ore.second(MYDATA$datetime)
R> range(second)
[1] 0.00000 59.87976
Example 5-26 Coercing Date and Time Data Types
This example uses the as.ore
subclass objects to coerce an ore.datetime
data type into other data types. The example uses the datetime
column of the MYDATA
ore.frame
object created in the first example. That column contains ore.datetime
values. This example first extracts the date from the MYDATA$datetime
column. The resulting dateOnly
object has ore.date
values that contain only the year, month, and day, but not the time. The example then coerces the ore.datetime
values into objects with ore.character
and ore.integer
values that represent the names of days, the number of the day of the year, and the quarter of the year.
dateOnly <- as.ore.date(MYDATA$datetime) class(dateOnly) head(sort(unique(dateOnly)),3) nameOfDay <- as.ore.character(MYDATA$datetime, format = "DAY") class(nameOfDay) sort(unique(nameOfDay)) dayOfYear <- as.integer(as.character(MYDATA$datetime, format = "DDD")) class(dayOfYear) range(dayOfYear) quarter <- as.integer(as.character(MYDATA$datetime, format = "Q")) class(quarter) sort(unique(quarter))
Listing for This Example
R> dateOnly <- as.ore.date(MYDATA$datetime)
R> class(dateOnly)[1] "ore.date"
attr(,"package")[1] "OREbase"
R> head(sort(unique(dateOnly)),3)
[1] "2001-01-01" "2001-01-02" "2001-01-03"
R> nameOfDay <- as.ore.character(MYDATA$datetime, format = "DAY")
R> class(nameOfDay)
[1] "ore.character"
attr(,"package")
[1] "OREbase"
R> sort(unique(nameOfDay))
[1] "FRIDAY " "MONDAY " "SATURDAY " "SUNDAY " "THURSDAY " "TUESDAY " "WEDNESDAY"
R> dayOfYear <- as.integer(as.character(MYDATA$datetime, format = "DDD"))
R> class(dayOfYear)
[1] "ore.integer"
attr(,"package")
[1] "OREbase"
R> range(dayOfYear)
[1] 1 365
R> quarter <- as.integer(as.character(MYDATA$datetime, format = "Q"))
R> class(quarter)
[1] "ore.integer"
attr(,"package")
[1] "OREbase"
R> sort(unique(quarter))
[1] 1 2 3 4
Example 5-27 Using a Window Function
This example uses the window functions ore.rollmean
and ore.rollsd
to compute the rolling mean and the rolling standard deviation. The example uses the MYDATA
ore.frame
object created in the first example. This example ensures that MYDATA
is an ordered ore.frame
by assigning the values of the datetime
column as the row names of MYDATA
. The example computes the rolling mean and the rolling standard deviation over five periods. Next, to use the R time series functionality in the stats
package, the example pulls data to the client. To limit the data pulled to the client, it uses the vector is.March
from the third example to select only the data points in March. The example creates a time series object using the ts
function, builds the Arima model, and predicts three points out.
row.names(MYDATA) <- MYDATA$datetime MYDATA$rollmean5 <- ore.rollmean(MYDATA$x, k = 5) MYDATA$rollsd5 <- ore.rollsd (MYDATA$x, k = 5) head(MYDATA) marchData <- ore.pull(MYDATA[isMarch,]) tseries.x <- ts(marchData$x) arima110.x <- arima(tseries.x, c(1,1,0)) predict(arima110.x, 3) tseries.rm5 <- ts(marchData$rollmean5) arima110.rm5 <- arima(tseries.rm5, c(1,1,0)) predict(arima110.rm5, 3)
Listing for This Example
R> row.names(MYDATA) <- MYDATA$datetime
R> MYDATA$rollmean5 <- ore.rollmean(MYDATA$x, k = 5)
R> MYDATA$rollsd5 <- ore.rollsd (MYDATA$x, k = 5)
R> head(MYDATA)
datetime difftime
2001-01-01 00:00:00 2001-01-01 00:00:00 39.998460 secs
x rollmean5 rollsd5
-0.3450421 -0.46650761 0.8057575
datetime difftime
2001-01-01 17:30:25 2001-01-01 17:30:25 37.75568 secs
x rollmean5 rollsd5
-1.3261019 0.02877517 1.1891384
datetime difftime
2001-01-02 11:00:50 2001-01-02 11:00:50 18.44243 secs
x rollmean5 rollsd5
0.2716211 -0.13224503 1.0909515
datetime difftime
2001-01-03 04:31:15 2001-01-03 04:31:15 38.594384 secs
x rollmean5 rollsd5
1.5146235 0.36307913 1.4674456
datetime difftime
2001-01-03 22:01:41 2001-01-03 22:01:41 2.520976 secs
x rollmean5 rollsd5
-0.7763258 0.80073340 1.1237925
datetime difftime
2001-01-04 15:32:06 2001-01-04 15:32:06 56.333281 secs
x rollmean5 rollsd5
2.1315787 0.90287282 1.0862614
R> marchData <- ore.pull(MYDATA[isMarch,])
R> tseries.x <- ts(marchData$x)
R> arima110.x <- arima(tseries.x, c(1,1,0))
R> predict(arima110.x, 3)
$pred
Time Series:
Start = 44
End = 46
Frequency = 1
[1] 1.4556614 0.6156379 1.1387587
$se
Time Series:
Start = 44
End = 46
Frequency = 1
[1] 1.408117 1.504988 1.850830
R> tseries.rm5 <- ts(marchData$rollmean5)
R> arima110.rm5 <- arima(tseries.rm5, c(1,1,0))
R> predict(arima110.rm5, 3)
$pred
Time Series:
Start = 44
End = 46
Frequency = 1
[1] 0.3240135 0.3240966 0.3240922
$se
Time Series:
Start = 44
End = 46
Frequency = 1
[1] 0.3254551 0.4482886 0.5445763