10.3.6 Use the ore.rowApply Function
The ore.rowApply
function calls an R script with an ore.frame
as the input data.
The ore.rowApply
function passes the ore.frame
to the user-defined input function as the first argument to that function. The rows
argument to the ore.rowApply
function specifies the number of rows to pass to each invocation of the user-defined R function. The last chunk or rows may have fewer rows than the number specified. The ore.rowApply
function can use data-parallel execution, in which one or more R engines perform the same R function, or task, on different partitions of data.
The syntax of the ore.rowApply
function is the following:
ore.rowApply(X, FUN, ..., FUN.VALUE = NULL, FUN.NAME = NULL, rows = 1, FUN.OWNER = NULL, parallel = getOption("ore.parallel", NULL))
The ore.rowApply
function returns an ore.list
object or an ore.frame
object.
Example 10-11 Using the ore.rowApply Function
This example uses the e1071
package, previously downloaded from CRAN. The example does the following:
-
Loads the package
e1071
. -
Pushes the
iris
data set to the database as theIRIS
temporary table andore.frame
object. -
Creates the Naive Bayes model
nbmod
. -
Creates a copy of
IRIS
asIRIS_PRED
and adds the PRED column toIRIS_PRED
to contain the predictions. -
calls the
ore.rowApply
function, passing theIRIS
ore.frame
as the data source for user-defined R function and the user-defined R function itself. The user-defined function does the following:-
Loads the package
e1071
so that it is available to the R engine or engines that run in the database. -
Converts the Species column to a factor because, although the
ore.frame
defined factors, when they are loaded to the user-defined function, factors appear as character vectors. -
calls the
predict
method and returns theres
object, which contains the predictions in the column added to the data set.
-
-
Pulls the model to the client R session.
-
Passes
IRIS_PRED
as the argumentFUN.VALUE
, which specifies the structure of the object that theore.rowApply
function returns. -
Specifies the number of rows to pass to each invocation of the user-defined function.
-
Displays the class of
res
, and calls thetable
function to display the Species column and the PRED column of theres
object.
%r
# Create a temporary R data.frame proxy object for the iris data.frame.
IRIS <- ore.push(iris)
# Build a model using a data.frame
mod <- lm(Petal.Length ~ Petal.Width + Sepal.Width + Sepal.Length, data=iris)
# Save the model to the datastore
ore.save(mod, "mod", name="ds-1", overwrite=TRUE)
# Create a user-defined function that loads a model residing in the datastore and scores the model on new data.
scoreLM.1 <- function(dat, dsname){
ore.load(dsname)
dat$Petal.Length_prediction <- predict(mod, newdata = dat)
dat[,c("Petal.Length_prediction","Petal.Length","Species")]
}
# Save the user-defined scoring function in the R script repository.
ore.scriptCreate(name = 'scoreLM.1',
FUN = scoreLM.1,
overwrite = TRUE)
# Run the scoring function in the script repository as well as specifying the desired number of parallel R engines using the parallel argument.
# View the first 6 records of the result.
res1 <- ore.rowApply(IRIS,
scoreLM.1,
dsname = "ds-1",
rows = 10,
parallel = 2)
head(res1)
# Run the function again, this time
res2 <- ore.rowApply(IRIS,
scoreLM.1,
dsname = "ds-1",
rows = 10,
parallel = 2,
FUN.VALUE = data.frame(Petal.Length_prediction=numeric(),
Petal.Length=numeric(),
Species=character()))
class(res2)
The output is similar to the following:
Table 10-8 A data.frame: 6 x 3
Petal.Length_prediction | Petal.Length | Species | |
---|---|---|---|
<dbl> | <dbl> | <chr> | |
1 | 1.484210 | 1.4 | setosa |
2 | 1.661389 | 1.4 | setosa |
3 | 1.386358 | 1.3 | setosa |
4 | 1.378046 | 1.5 | setosa |
5 | 1.346695 | 1.4 | setosa |
6 | 1.733905 | 1.7 | setosa |
Parent topic: R Interface for Embedded R Execution