9.3.7.2 Column-Parallel Use Case
The example uses the R summary
function to compute in parallel summary statistics on the first four numeric columns of the iris
data set.
Example 9-13 Using the ore.indexApply Function and Combining Results
The example combines the computations into a final result. The first argument to the ore.indexApply
function is 4, which specifies the number of columns to summarize in parallel. The user-defined input function takes one argument, index
, which will be a value between 1 and 4 and which specifies the column to summarize.
The example calls the summary
function on the specified column. The summary
invocation returns a single row, which contains the summary statistics for the column. The example converts the result of the summary
invocation into a data.frame
and adds the column name to it.
The example next uses the FUN.VALUE
argument to the ore.indexApply
function to define the structure of the result of the function. The result is then returned as an ore.frame
object with that structure.
%r
res <- ore.indexApply(4,
function(index) {
ss <- summary(iris[, index])
attr.names <- attr(ss, "names")
stats <- data.frame(matrix(ss, 1, length(ss)))
names(stats) <- attr.names
stats$col <- names(iris)[index]
stats
},
FUN.VALUE=data.frame(Min. = numeric(0),
"1st Qu." = numeric(0),
Median = numeric(0),
Mean = numeric(0),
"3rd Qu." = numeric(0),
Max. = numeric(0),
Col = character(0)),
parallel = TRUE)
res
The output is similar to the following:
Table 9-9 A data.frame: 4 x 7
Min. | X1st.Qu. | Median | Mean | X3rd.Qu. | Max. | Col |
---|---|---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <chr> |
4.3 | 5.1 | 5.80 | 5.843333 | 6.4 | 7.9 | Sepal.Length |
2.0 | 2.8 | 3.00 | 3.057333 | 3.3 | 4.4 | Sepal.Width |
1.0 | 1.6 | 4.35 | 3.758000 | 5.1 | 6.9 | Petal.Length |
0.1 | 0.3 | 1.30 | 1.199333 | 1.8 | 2.5 | Petal.Width |
Listing for This Example
R> res <- ore.indexApply(4, + function(index) { + ss <- summary(iris[, index]) + attr.names <- attr(ss, "names") + stats <- data.frame(matrix(ss, 1, length(ss))) + names(stats) <- attr.names + stats$col <- names(iris)[index] + stats + }, + FUN.VALUE=data.frame(Min. = numeric(0), + "1st Qu." = numeric(0), + Median = numeric(0), + Mean = numeric(0), + "3rd Qu." = numeric(0), + Max. = numeric(0), + Col = character(0)), + parallel = TRUE) R> res Min. X1st.Qu. Median Mean X3rd.Qu. Max. Col 1 2.0 2.8 3.00 3.057 3.3 4.4 Sepal.Width 2 4.3 5.1 5.80 5.843 6.4 7.9 Sepal.Length 3 0.1 0.3 1.30 1.199 1.8 2.5 Petal.Width 4 1.0 1.6 4.35 3.758 5.1 6.9 Petal.Length Warning message: ORE object has no unique key - using random order
Parent topic: Use the ore.indexApply Function