Prepare Time Series Data

5.1.9 Prepare Time Series Data

OML4R provides you with the ability to perform many data preparation operations on time series data, such as filtering, ordering, and transforming the data.

OML4R maps R data types to SQL data types, which allows you to create OML4R objects and perform data preparation operations in database memory. The following examples demonstrate some operations on time series data.

Example 5-22 Aggregating Date and Time Data

This example illustrates some of the statistical aggregation functions. For a data set, the example first generates on the local client a sequence of five hundred dates spread evenly throughout 2001. It then introduces a random difftime and a vector of random normal values. The example then uses the ore.push function to create MYDATA, an in-database version of the data. The example calls the class function to show that MYDATA is an ore.frame object and that the datetime column is of class ore.datetime. The example displays the first three rows of the generated data. It then uses the statistical aggregation operations of min, max, range, median, and quantile on the datetime column of MYDATA.

N <- 500
mydata <- data.frame(datetime = 
                seq(as.POSIXct("2001/01/01"),
                    as.POSIXct("2001/12/31"),
                    length.out = N),
                difftime = as.difftime(runif(N),
                                 units = "mins"),
                x = rnorm(N))
MYDATA <- ore.push(mydata)
class(MYDATA)
class(MYDATA$datetime)
head(MYDATA,3)
# statistical aggregations
min(MYDATA$datetime)
max(MYDATA$datetime)
range(MYDATA$datetime)
quantile(MYDATA$datetime,
         probs = c(0, 0.05, 0.10))

Listing for This Example

R> N <- 500
R> mydata <- data.frame(datetime = 
+             seq(as.POSIXct("2001/01/01"),
+                 as.POSIXct("2001/12/31"),
+                 length.out = N),
+             difftime = as.difftime(runif(N),
+                              units = "mins"),
+             x = rnorm(N))
R> MYDATA <- ore.push(mydata)
R> class(MYDATA)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
R> class(MYDATA$datetime)
[1] "ore.datetime"
attr(,"package")
[1] "OREbase"
R> head(MYDATA,3)
             datetime       difftime           x
1 2001-01-01 00:00:00 16.436782 secs  0.68439244
2 2001-01-01 17:30:25  8.711562 secs  1.38481435
3 2001-01-02 11:00:50  1.366927 secs -0.00927078

R> # statistical aggregations
R> min(MYDATA$datetime)
[1] "2001-01-01 CST"
R> max(MYDATA$datetime)
[1] "2001-12-31 CST"
R> range(MYDATA$datetime)
[1] "2001-01-01 CST" "2001-12-31 CST"
R> quantile(MYDATA$datetime,
+           probs = c(0, 0.05, 0.10))
                       0%                        5%                       10% 
"2001-01-01 00:00:00 CST" "2001-01-19 04:48:00 CST" "2001-02-06 09:36:00 CST"

Example 5-23 Using Date and Time Arithmetic

This example creates a one day shift by taking the datetime column of the MYDATA ore.frame object created in the previous example and adding a difftime of one day. The result is day1Shift, which the example shows is of class ore.datetime. The example displays the first three elements of the datetime column of MYDATA and those of day1Shift. The first element of day1Shift is January 2, 2001.

This example also computes lag differences using the overloaded diff function. The difference between the dates is all the same because the 500 dates in MYDATA are evenly distributed throughout 2001.

day1Shift <- MYDATA$datetime + as.difftime(1, units = "days")
class(day1Shift)
head(MYDATA$datetime,3)
head(day1Shift,3)
lag1Diff <- diff(MYDATA$datetime)
class(lag1Diff)
head(lag1Diff,3)

Listing for This Example

R> day1Shift <- MYDATA$datetime + as.difftime(1, units = "days")
R> class(day1Shift)
[1] "ore.datetime"
attr(,"package")
[1] "OREbase"
R> head(MYDATA$datetime,3)
[1] "2001-01-01 00:00:00 CST" "2001-01-01 17:30:25 CST" "2001-01-02 11:00:50 CST"
R> head(day1Shift,3)
[1] "2001-01-02 00:00:00 CST" "2001-01-02 17:30:25 CST" "2001-01-03 11:00:50 CST"
R> lag1Diff <- diff(MYDATA$datetime)
R> class(lag1Diff)
[1] "ore.difftime"
attr(,"package")
[1] "OREbase"
R> head(lag1Diff,3)
Time differences in secs
[1] 63025.25 63025.25 63025.25

Example 5-24 Comparing Dates and Times

This example demonstrates date and time comparisons. The example uses the datetime column of the MYDATA ore.frame object created in the first example. This example selects the elements of MYDATA that have a date earlier than April 1, 2001. The resulting isQ1 is of class ore.logical and for the first three entries the result is TRUE. The example finds out how many dates matching isQ1 are in March. It then sums the logical vector and displays the result, which is that 43 rows are in March. The example next filters rows based on dates that are the end of the year, after December 27. The result is eoySubset, which is an ore.frame object. The example displays the first three rows returned in eoySubset.

isQ1 <- MYDATA$datetime < as.Date("2001/04/01")
class(isQ1)
head(isQ1,3)
isMarch <- isQ1 & MYDATA$datetime > as.Date("2001/03/01")
class(isMarch)
head(isMarch,3)
sum(isMarch)
eoySubset <- MYDATA[MYDATA$datetime > as.Date("2001/12/27"), ]
class(eoySubset)
head(eoySubset,3)

Listing for This Example

R> isQ1 <- MYDATA$datetime < as.Date("2001/04/01")
R> class(isQ1)
[1] "ore.logical"
attr(,"package")
[1] "OREbase"
R> head(isQ1,3)
[1] TRUE TRUE TRUE
R> isMarch <- isQ1 & MYDATA$datetime > as.Date("2001/03/01")
R> class(isMarch)
[1] "ore.logical"
attr(,"package")
[1] "OREbase"
R> head(isMarch,3)
[1] FALSE FALSE FALSE
R> sum(isMarch)
[1] 43
R> eoySubset <- MYDATA[MYDATA$datetime > as.Date("2001/12/27"), ]
R> class(eoySubset)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
R> head(eoySubset,3)
               datetime      difftime          x
495 2001-12-27 08:27:53 55.76474 secs -0.2740492
496 2001-12-28 01:58:18 15.42946 secs -1.4547270
497 2001-12-28 19:28:44 28.62195 secs  0.2929171

Example 5-25 Using Date and Time Accessors

OML4R has accessor functions that you can use to extract various components from datetime objects, such as year, month, day of the month, hour, minute, and second. This example demonstrates the use of these functions. The example uses the datetime column of the MYDATA ore.frame object created in the first example.

This example gets the year elements of the datetime column. The invocation of the unique function for year displays 2001 because it is the only year value in the column. However, for objects that have a range of values, as for example, ore.mday, the range function returns the day of the month. The result contains a vector with values that range from 1 through 31. Invoking the range function succinctly reports the range of values, as demonstrated for the other accessor functions.

year <- ore.year(MYDATA$datetime)
unique(year)
month <- ore.month(MYDATA$datetime)
range(month)
dayOfMonth <- ore.mday(MYDATA$datetime)
range(dayOfMonth)
hour <- ore.hour(MYDATA$datetime)
range(hour)
minute <- ore.minute(MYDATA$datetime)
range(minute)
second <- ore.second(MYDATA$datetime)
range(second)

Listing for This Example

R> year <- ore.year(MYDATA$datetime)
R> unique(year)
[1] 2001
R> month <- ore.month(MYDATA$datetime)
R> range(month)
[1]  1 12
R> dayOfMonth <- ore.mday(MYDATA$datetime)
R> range(dayOfMonth)
[1]  1 31
R> hour <- ore.hour(MYDATA$datetime)
R> range(hour)
[1]  0 23
R> minute <- ore.minute(MYDATA$datetime)
R> range(minute)
[1]  0 59
R> second <- ore.second(MYDATA$datetime)
R> range(second)
[1]  0.00000 59.87976

Example 5-26 Coercing Date and Time Data Types

This example uses the as.ore subclass objects to coerce an ore.datetime data type into other data types. The example uses the datetime column of the MYDATA ore.frame object created in the first example. That column contains ore.datetime values. This example first extracts the date from the MYDATA$datetime column. The resulting dateOnly object has ore.date values that contain only the year, month, and day, but not the time. The example then coerces the ore.datetime values into objects with ore.character and ore.integer values that represent the names of days, the number of the day of the year, and the quarter of the year.

dateOnly <- as.ore.date(MYDATA$datetime)
class(dateOnly)
head(sort(unique(dateOnly)),3)
nameOfDay <- as.ore.character(MYDATA$datetime, format = "DAY")
class(nameOfDay)
sort(unique(nameOfDay))
dayOfYear <- as.integer(as.character(MYDATA$datetime, format = "DDD"))
class(dayOfYear)
range(dayOfYear)
quarter <- as.integer(as.character(MYDATA$datetime, format = "Q"))
class(quarter)
sort(unique(quarter))

Listing for This Example

R> dateOnly <- as.ore.date(MYDATA$datetime)
R> class(dateOnly)[1] "ore.date"
attr(,"package")[1] "OREbase"
R> head(sort(unique(dateOnly)),3)
[1] "2001-01-01" "2001-01-02" "2001-01-03"
R> nameOfDay <- as.ore.character(MYDATA$datetime, format = "DAY")
R> class(nameOfDay)
[1] "ore.character"
attr(,"package")
[1] "OREbase"
R> sort(unique(nameOfDay))
[1] "FRIDAY " "MONDAY " "SATURDAY " "SUNDAY " "THURSDAY " "TUESDAY " "WEDNESDAY"
R> dayOfYear <- as.integer(as.character(MYDATA$datetime, format = "DDD"))
R> class(dayOfYear)
[1] "ore.integer"
attr(,"package")
[1] "OREbase"
R> range(dayOfYear)
[1]   1 365
R> quarter <- as.integer(as.character(MYDATA$datetime, format = "Q"))
R> class(quarter)
[1] "ore.integer"
attr(,"package")
[1] "OREbase"
R> sort(unique(quarter))
[1] 1 2 3 4

Example 5-27 Using a Window Function

This example uses the window functions ore.rollmean and ore.rollsd to compute the rolling mean and the rolling standard deviation. The example uses the MYDATA ore.frame object created in the first example. This example ensures that MYDATA is an ordered ore.frame by assigning the values of the datetime column as the row names of MYDATA. The example computes the rolling mean and the rolling standard deviation over five periods. Next, to use the R time series functionality in the stats package, the example pulls data to the client. To limit the data pulled to the client, it uses the vector is.March from the third example to select only the data points in March. The example creates a time series object using the ts function, builds the Arima model, and predicts three points out.

row.names(MYDATA) <- MYDATA$datetime
MYDATA$rollmean5 <- ore.rollmean(MYDATA$x, k = 5)
MYDATA$rollsd5 <- ore.rollsd (MYDATA$x, k = 5)
head(MYDATA)
marchData <- ore.pull(MYDATA[isMarch,])
tseries.x <- ts(marchData$x)
arima110.x <- arima(tseries.x, c(1,1,0))
predict(arima110.x, 3)
tseries.rm5 <- ts(marchData$rollmean5)
arima110.rm5 <- arima(tseries.rm5, c(1,1,0))
predict(arima110.rm5, 3)

Listing for This Example

R> row.names(MYDATA) <- MYDATA$datetime
R> MYDATA$rollmean5 <- ore.rollmean(MYDATA$x, k = 5)
R> MYDATA$rollsd5 <- ore.rollsd (MYDATA$x, k = 5)
R> head(MYDATA)
                               datetime       difftime
2001-01-01 00:00:00 2001-01-01 00:00:00 39.998460 secs
                                                     x   rollmean5   rollsd5
                                            -0.3450421 -0.46650761 0.8057575
                               datetime       difftime
2001-01-01 17:30:25 2001-01-01 17:30:25  37.75568 secs
                                                     x   rollmean5   rollsd5
                                            -1.3261019  0.02877517 1.1891384
                               datetime       difftime
2001-01-02 11:00:50 2001-01-02 11:00:50  18.44243 secs
                                                     x   rollmean5   rollsd5
                                             0.2716211 -0.13224503 1.0909515
                               datetime       difftime
2001-01-03 04:31:15 2001-01-03 04:31:15 38.594384 secs
                                                     x   rollmean5   rollsd5
                                             1.5146235  0.36307913 1.4674456
                               datetime       difftime
2001-01-03 22:01:41 2001-01-03 22:01:41  2.520976 secs
                                                     x   rollmean5   rollsd5
                                            -0.7763258  0.80073340 1.1237925
                               datetime       difftime
2001-01-04 15:32:06 2001-01-04 15:32:06 56.333281 secs 
                                                     x   rollmean5   rollsd5
                                             2.1315787  0.90287282 1.0862614
R> marchData <- ore.pull(MYDATA[isMarch,])
R> tseries.x <- ts(marchData$x)
R> arima110.x <- arima(tseries.x, c(1,1,0))
R> predict(arima110.x, 3)
$pred
Time Series:
Start = 44 
End = 46 
Frequency = 1 
[1] 1.4556614 0.6156379 1.1387587
 
$se
Time Series:
Start = 44 
End = 46 
Frequency = 1 
[1] 1.408117 1.504988 1.850830
 
R> tseries.rm5 <- ts(marchData$rollmean5)
R> arima110.rm5 <- arima(tseries.rm5, c(1,1,0))
R> predict(arima110.rm5, 3)
$pred
Time Series:
Start = 44 
End = 46 
Frequency = 1 
[1] 0.3240135 0.3240966 0.3240922
 
$se
Time Series:
Start = 44 
End = 46 
Frequency = 1 
[1] 0.3254551 0.4482886 0.5445763

Parent topic: Prepare Data in the Database Using Oracle Machine Learning for R