10.5.4 rqGroupEval Function
The rqGroupEval
function is a user-defined function that identifies a grouping column.
The user defines an rqGroupEval
function in PL/SQL using the SQL object rqGroupEvalImpl
, which is a generic implementation of the group apply functionality in SQL. The implementation supports data-parallel execution, in which one or more R engines perform the same R function, or task, on different partitions of data. The data is partitioned according to the values of the grouping column.
Only one grouping column is supported. If you have multiple columns, then combine the columns into one column and use the new column as the grouping column.
The rqGroupEval
function executes the R function in the script specified by the EXP_NAM
parameter. You pass data to the R function with the INP_CUR
parameter. You can pass arguments to the R function with the PAR_CUR
parameter.
The R function returns an R data.frame
object, which appears as a SQL table in the database. You define the form of the returned value with the OUT_QRY
parameter.
To create an rqGroupEval
function, you create the following two PL/SQL objects:
-
A PL/SQL package that specifies the types of the result to return.
-
A function that takes the return value of the package and uses the return value with
PIPELINED_PARALLEL_ENABLE
set to indicate the column on which to partition data.
Syntax
rqGroupEval
(
INP_CUR REF CURSOR IN
PAR_CUR REF CURSOR IN
OUT_QRY VARCHAR2 IN
GRP_COL VARCHAR2 IN
EXP_NAM VARCHAR2 IN)
Parameters
Parameter | Description |
---|---|
|
A cursor that specifies the data to pass to the R function specified by the |
|
A cursor that contains argument values to pass to the R function. |
|
One of the following:
|
|
The name of the grouping column by which to partition the data. |
|
The name of a script in the OML4R script repository. |
Return Value
The user-defined rqGroupEval
function returns a table that has the structure specified by the OUT_QRY
parameter value.
Examples
This example has a PL/SQL block that drops the script myC5.0Function
to ensure that the script does not exist in the OML4R script repository. It then creates a function and stores it as the script myC5.0Function
in the script repository.
The R function accepts two arguments: the data on which to operate and a prefix to use in creating datastores. The function uses the C50 package to build C5.0 models on the churn
data set from C50. The function builds one churn model on the data for each state.
The myC5.0Function
function loads the C50 package so that the function body has access to it when the function executes in an R engine on the database server. The function then creates a datastore name using the datastore prefix and the name of a state. To exclude the state name from the model, the function deletes the column from the data.frame
. Because factors in the data.frame
are converted to character vectors when they are loaded in the user-defined embedded R function, the myC5.0Function
function explicitly converts the character vectors back to R factors.
The myC5.0Function
function gets the data for the state from the specified columns and then creates a model for the state and saves the model in a datastore. The R function returns TRUE
to have a simple value that can appear as the result of the function execution.
The example next creates a PL/SQL package, churnPkg
, and a user-defined function, churnGroupEval
. In defining an rqGroupEval
function implementation, the PARALLEL_ENABLE
clause is optional but the CLUSTER BY
clause is required.
Finally, the example executes a SELECT
statement that invokes the churnGroupEval
function. In the INP_CUR
argument of the churnGroupEval
function, the SELECT
statement specifies the PARALLEL
hint to use parallel execution of the R function and the data set to pass to the R function. The INP_CUR
argument of the churnGroupEval
function specifies connecting to OML4R and the datastore prefix to pass to the R function. The OUT_QRY
argument specifies returning the value in XML format, the GRP_NAM
argument specifies using the state column of the data set as the grouping column, and the EXP_NAM
argument specifies the myC5.0Function
script in the script repository as the R function to invoke.
For each of 50 states plus Washington, D.C., the SELECT
statement returns from the churnGroupEval
table function the name of the state and an XML string that contains the value TRUE
.
Example 10-22 Using an rqGroupEval Function
BEGIN sys.rqScriptDrop('myC5.0Function'); sys.rqScriptCreate('myC5.0Function', 'function(dat, datastorePrefix) { library(C50) datastoreName <- paste(datastorePrefix, dat[1, "state"], sep = "_") dat$state <- NULL dat$churn <- as.factor(dat$churn) dat$area_code <- as.factor(dat$area_code) dat$international_plan <- as.factor(dat$international_plan) dat$voice_mail_plan <- as.factor(dat$voice_mail_plan) mod <- C5.0(churn ~ ., data = dat, rules = TRUE) ore.save(mod, name = datastoreName) TRUE }'); END; / CREATE OR REPLACE PACKAGE churnPkg AS TYPE cur IS REF CURSOR RETURN CHURN_TRAIN%ROWTYPE; END churnPkg; / CREATE OR REPLACE FUNCTION churnGroupEval( inp_cur churnPkg.cur, par_cur SYS_REFCURSOR, out_qry VARCHAR2, grp_col VARCHAR2, exp_txt CLOB) RETURN SYS.AnyDataSet PIPELINED PARALLEL_ENABLE (PARTITION inp_cur BY HASH ("state")) CLUSTER inp_cur BY ("state") USING rqGroupEvalImpl; / SELECT * FROM table(churnGroupEval( cursor(SELECT * /*+ parallel(t,4) */ FROM CHURN_TRAIN t), cursor(SELECT 1 AS "ore.connect", 'myC5.0model' AS "datastorePrefix" FROM dual), 'XML', 'state', 'myC5.0Function'));