9.5.6 rqRowEval Function
The rqRowEval
function executes the R function in the script specified by the EXP_NAM
parameter.
You pass data to the R function with the INP_CUR
parameter. You can pass arguments to the R function with the PAR_CUR
parameter. The ROW_NUM
parameter specifies the number of rows that should be passed to each invocation of the R function. The last chunk may have fewer rows than the number specified.
The rqRowEval
function supports data-parallel execution, in which one or more R engines perform the same R function, or task, on disjoint chunks of data. Oracle Database handles the management and control of the potentially multiple R engines that run on the database server machine, automatically chunking and passing data to the R engines executing in parallel. Oracle Database ensures that R function executions for all chunks of rows complete, or the rqRowEval
function returns an error.
The R function returns an R data.frame
object, which appears as a SQL table in the database. You define the form of the returned value with the OUT_QRY
parameter.
Syntax
rqRowEval
(
INP_CUR REF CURSOR IN
PAR_CUR REF CURSOR IN
OUT_QRY VARCHAR2 IN
ROW_NUM NUMBER IN
EXP_NAM VARCHAR2 IN)
Parameters
Table 9-17 Parameters of the rqRowEval Function
Parameter | Description |
---|---|
|
A cursor that specifies the data to pass to the R function specified by the |
|
A cursor that contains argument values to pass to the R function. |
|
One of the following:
|
|
The number of rows to include in each invocation of the R function. |
|
The name of a script in the OML4R script repository. |
Return Value
Function rqRowEval
returns a table that has the structure specified by the OUT_QRY
parameter value.
Examples
This example uses the C50 package to score churn data (that is, to predict which customers are likely to churn) using C5.0 decision tree models. The example scores the customers from the specified state in parallel. This example produces the same result as the invocation of function ore.rowApply
.
Tip:
This example uses the CHURN_TEST table and the myXLevels
datastore. So in R you should invoke the functions that create the table and that get the xlevels
object and save it in the myXLevels
datastore before running this example.
Example 9-24 Using an rqRowEval Function
This example creates a user-defined function and saves the function in the OML4R script repository. The user-defined function creates a C5.0 model for a state and saves the model in a datastore. The function myC5.0FunctionForLevels
returns the value TRUE
.
This example creates the PL/SQL package churnPkg
and the function churnGroupEval
. The example declares a cursor to get the names of the datastores that include the string myC5.0modelFL
and then executes a PL/SQL block that deletes those datastores. The example next executes a SELECT
statement that invokes the churnGroupEval
function. The churnGroupEval
function invokes the myC5.0FunctionForLevels
function to generate the C5.0 models and save them in datastores.
The example then creates the myScoringFunction
function and stores it in the script repository. The function scores a C5.0 model for the levels of a state and returns the results in a data.frame
.
Finally, the example executes a SELECT
statement that invokes the rqRowEval
function. The input cursor to the rqRowEval
function uses the PARALLEL
hint to specify the degree of parallelism to use. The cursor specifies the CHURN_TEST table as the data source and filters the rows to include only those for Massachusetts. All rows processed use the same predictive model.
The parameters cursor specifies the ore.connect
control argument to connect to OML4R on the database server and specifies values for the datastorePrefix
and xlevelsDatastore
arguments to the myScoringFunction
function.
The SELECT
statement for the OUT_QRY
parameter specifies the format of the output. The ROW_NUM
parameter specifies 200 as the number of rows to process at a time in each parallel R engine. The EXP_NAME
parameter specifies myScoringFunction
in the script repository as the R function to invoke.
BEGIN sys.rqScriptDrop('myC5.0FunctionForLevels'); sys.rqScriptCreate('myC5.0FunctionForLevels', 'function(dat, xlevelsDatastore, datastorePrefix) { library(C50) state <- dat[1,"state"] datastoreName <- paste(datastorePrefix, dat[1, "state"], sep = "_") dat$state <- NULL ore.load(name = xlevelsDatastore) # To get the xlevels object. for (j in names(xlevels)) dat[[j]] <- factor(dat[[j]], levels = xlevels[[j]]) c5mod <- C5.0(churn ~ ., data = dat, rules = TRUE) ore.save(c5mod, name = datastoreName) TRUE }'); END; / CREATE OR REPLACE PACKAGE churnPkg AS TYPE cur IS REF CURSOR RETURN CHURN_TEST%ROWTYPE; END churnPkg; / CREATE OR REPLACE FUNCTION churnGroupEval( inp_cur churnPkg.cur, par_cur SYS_REFCURSOR, out_qry VARCHAR2, grp_col VARCHAR2, exp_txt CLOB) RETURN SYS.AnyDataSet PIPELINED PARALLEL_ENABLE (PARTITION inp_cur BY HASH ("state")) CLUSTER inp_cur BY ("state") USING rqGroupEvalImpl; / DECLARE CURSOR c1 IS SELECT dsname FROM RQUSER_DATASTORELIST WHERE dsname like 'myC5.0modelFL%'; BEGIN FOR dsname_st IN c1 LOOP rqDropDataStore(dsname_st.dsname); END LOOP; END; SELECT * FROM table(churnGroupEval( cursor(SELECT * /*+ parallel(t,4) */ FROM CHURN_TEST t), cursor(SELECT 1 AS "ore.connect", 'myXLevels' as "xlevelsDatastore", 'myC5.0modelFL' AS "datastorePrefix" FROM dual), 'XML', 'state', 'myC5.0FunctionForLevels')); BEGIN sys.rqScriptDrop('myScoringFunction'); sys.rqScriptCreate('myScoringFunction', 'function(dat, xlevelsDatastore, datastorePrefix) { library(C50) state <- dat[1, "state"] datastoreName <- paste(datastorePrefix, state, sep = "_") dat$state <- NULL ore.load(name = xlevelsDatastore) # To get the xlevels object. for (j in names(xlevels)) dat[[j]] <- factor(dat[[j]], levels = xlevels[[j]]) ore.load(name = datastoreName) res <- data.frame(pred = predict(c5mod, dat, type = "class"), actual= dat$churn, state = state) res }'); END; / SELECT * FROM table(rqRowEval( cursor(select /*+ parallel(t, 4) */ * FROM CHURN_TEST t WHERE "state" = 'MA'), cursor(SELECT 1 as "ore.connect", 'myC5.0modelFL' as "datastorePrefix", 'myXLevels' as "xlevelsDatastore" FROM dual), 'SELECT ''aaa'' "pred",''aaa'' "actual" , ''aa'' "state" FROM dual', 200, 'myScoringFunction'));
In Oracle SQL Developer, the results of the last SELECT
statement are:
pred actual state ---- ------ ----- no no MA no no MA no no MA no no MA no no MA no no MA no no MA no yes MA yes yes MA yes yes MA no no MA no no MA no no MA no no MA no no MA no no MA yes yes MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no yes MA no no MA no no MA no no MA yes yes MA no no MA no no MA no no MA 38 rows selected