Support for Parallel Execution

10.1.3 Support for Parallel Execution

Some of the Oracle Machine Learning for R Embedded R Execution functions support the use of parallel execution in the database.

The ore.groupApply, ore.rowApply, rqGroupEval2, and rqRowEval2 functions support data-parallel execution and the ore.indexApply function supports task-parallel execution. This parallel execution capability enables a script to take advantage of high-performance computing hardware such as an Oracle Exadata Database Machine.

The parallel argument of the ore.groupApply, ore.rowApply, and ore.indexApply functions specifies the degree of parallelism to use in the Embedded R Execution. The value of the argument can be one of the following:

A positive integer greater than or equal to 2 for a specific degree of parallelism
FALSE or 1 for no parallelism
TRUE for the default parallelism of the data argument
NULL for the database default for the operation

The default value of the argument is the value of the global option ore.parallel or FALSE if ore.parallel is not set.

A user-defined R function invoked using ore.doEval or ore.tableApply is not executed in parallel. The function executes in a single R engine.

For the rqGroupEval2, and rqRowEval2 functions, the degree of parallelism is specified by a PARALLEL hint in the input cursor argument.

In data-parallel execution for the ore.groupApply and rqGroupEval2 functions, one or more R engines perform the same R function, or task, on different partitions of data. This functionality enables the building of large numbers of models, for example building tens or hundreds of thousands of predictive models, one model per customer.

In data-parallel execution for the ore.rowApply and rqRowEval2 functions, one or more R engines perform the same R function on disjoint chunks of data. This functionality enables scalable model scoring and predictions on large data sets.

In task-parallel execution for the ore.indexApply function, one or more R engines perform the same or different calculations, or task. A number, associated with the index of the execution, is provided to the function. This functionality is valuable in a variety of operations, such as in performing simulations.

Oracle Database handles the management and control of potentially multiple R engines at the database server, automatically partitioning and passing data to R engines executing in parallel. It ensures that all of the R function executions for all of the partitions complete; if not, the OML4R function returns an error. The result from the execution of each user-defined embedded R function is gathered in an ore.list. This list remains in the database until the user requires the result.

Embedded R execution also allows for data-parallel execution of user-defined R functions that may use functions from an open source R package from The Comprehensive R Archive Network (CRAN) or other third-party R package. However, third-party packages do not leverage in-database parallelism and are subject to the parallelism constraints of R. Third-party packages can benefit from the data-parallel and task-parallel execution supported in Embedded R Execution.

Parent topic: About Embedded R Execution