10.1.3 Support for Parallel Execution
Some of the Oracle Machine Learning for R Embedded R Execution functions support the use of parallel execution in the database.
The ore.groupApply
, ore.rowApply
, rqGroupEval2
, and rqRowEval2
functions support data-parallel execution and the ore.indexApply
function supports task-parallel execution. This parallel execution capability enables a script to take advantage of high-performance computing hardware such as an Oracle Exadata Database Machine.
The parallel
argument of the ore.groupApply
, ore.rowApply
, and ore.indexApply
functions specifies the degree of parallelism to use in the Embedded R Execution. The value of the argument can be one of the following:
-
A positive integer greater than or equal to
2
for a specific degree of parallelism -
FALSE
or1
for no parallelism -
TRUE
for the default parallelism of thedata
argument -
NULL
for the database default for the operation
The default value of the argument is the value of the global option ore.parallel
or FALSE
if ore.parallel
is not set.
A user-defined R function invoked using ore.doEval
or ore.tableApply
is not executed in parallel. The function executes in a single R engine.
For the rqGroupEval2
, and rqRowEval2
functions, the degree of parallelism is specified by a PARALLEL
hint in the input cursor argument.
In data-parallel execution for the ore.groupApply
and rqGroupEval2
functions, one or more R engines perform the same R function, or task, on different partitions of data. This functionality enables the building of large numbers of models, for example building tens or hundreds of thousands of predictive models, one model per customer.
In data-parallel execution for the ore.rowApply
and rqRowEval2
functions, one or more R engines perform the same R function on disjoint chunks of data. This functionality enables scalable model scoring and predictions on large data sets.
In task-parallel execution for the ore.indexApply
function, one or more R engines perform the same or different calculations, or task. A number, associated with the index of the execution, is provided to the function. This functionality is valuable in a variety of operations, such as in performing simulations.
Oracle Database handles the management and control of potentially multiple R engines at the database server, automatically partitioning and passing data to R engines executing in parallel. It ensures that all of the R function executions for all of the partitions complete; if not, the OML4R function returns an error. The result from the execution of each user-defined embedded R function is gathered in an ore.list
. This list remains in the database until the user requires the result.
Embedded R execution also allows for data-parallel execution of user-defined R functions that may use functions from an open source R package from The Comprehensive R Archive Network (CRAN) or other third-party R package. However, third-party packages do not leverage in-database parallelism and are subject to the parallelism constraints of R. Third-party packages can benefit from the data-parallel and task-parallel execution supported in Embedded R Execution.
Parent topic: About Embedded R Execution