9.3.7.3 Simulations Use Case

You can use the ore.indexApply function in simulations, which can take advantage of high-performance computing hardware like an Oracle Exadata Database Machine.

Example 9-14 Using the ore.indexApply Function in a Simulation

This example takes multiple samples from a random normal distribution to compare the distribution of the summary statistics. Each simulation occurs in a separate R engine in the database, in parallel, up to the degree of parallelism allowed by the database. The example defines variables for the sample size, the mean and standard deviations of the random numbers, and the number of simulations to perform. The example specifies num.simulations as the first argument to the ore.indexApply function. The ore.indexApply function passes num.simulations to the user-defined function as the index argument. This input function then sets the random seed based on the index so that each invocation of the input function generates a different set of random numbers.

The input function next uses the rnorm function to produce sample.size random normal values. It calls the summary function on the vector of random numbers, and then prepares a data.frame as the result it returns. The ore.indexApply function specifies the FUN.VALUE argument so that it returns an ore.frame that structures the combined results of the simulations. The res variable gets the ore.frame returned by the ore.indexApply function.

To get the distribution of samples, the example calls the boxplot function on the data.frame that is the result of using the ore.pull function to bring selected columns from res to the client.

%r

options("ore.warn.order" = FALSE)


sample.size = 1000
mean.val = 100
std.dev.val = 10
num.simulations = 10
 
res <- ore.indexApply(num.simulations,
      function(index, sample.size = 1000, mean = 0, std.dev = 1) {
        set.seed(index)
        x <- rnorm(sample.size, mean, std.dev)
        ss <- summary(x)
        attr.names <- attr(ss, "names")
        stats <- data.frame(matrix(ss, 1, length(ss)))
        names(stats) <- attr.names
        stats$index <- index
        stats
      },
      FUN.VALUE=data.frame(Min. = numeric(0),
        "1st Qu." = numeric(0),
        Median = numeric(0),
        Mean = numeric(0),
        "3rd Qu." = numeric(0),
        Max. = numeric(0),
        Index = numeric(0)),
      parallel = TRUE,
      sample.size = sample.size,
      mean = mean.val, std.dev = std.dev.val)
      

head(res, 3)
tail(res, 3)

boxplot(ore.pull(res[, 1:6]),
  main=sprintf("Boxplot of %d rnorm samples size %d, mean=%d, sd=%d",
               num.simulations, sample.size, mean.val, std.dev.val))

The output is similar to the following:

Table 9-10 A data.frame: 3 x 7

Min. X1st.Qu. Median Mean X3rd.Qu. Max. Index
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
69.91951 93.02627 99.64676 99.88352 106.8843 138.1028 1
72.78184 93.68699 100.50135 100.61999 107.7106 130.0882 2
69.43672 93.15461 100.32338 100.06397 106.7667 135.1930 3

Table 9-11 A data.frame: 3 x 7

Min. X1st.Qu. Median Mean X3rd.Qu. Max. Index
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
8 67.18068 92.73174 99.71516 99.58738 106.6340 129.7804 8
9 69.58926 93.51445 100.31074 100.05885 106.6231 127.6253 9
10 69.87836 93.22607 99.96999 100.11375 107.2746 135.4114 10

Listing for This Example

R> res <- ore.indexApply(num.simulations,
+       function(index, sample.size = 1000, mean = 0, std.dev = 1) {
+         set.seed(index)
+         x <- rnorm(sample.size, mean, std.dev)
+         ss <- summary(x)
+         attr.names <- attr(ss, "names")
+         stats <- data.frame(matrix(ss, 1, length(ss)))
+         names(stats) <- attr.names
+         stats$index <- index
+         stats
+       },
+       FUN.VALUE=data.frame(Min. = numeric(0),
+         "1st Qu." = numeric(0),
+         Median = numeric(0),
+         Mean = numeric(0),
+         "3rd Qu." = numeric(0),
+         Max. = numeric(0),
+         Index = numeric(0)),
+       parallel = TRUE,
+       sample.size = sample.size,
+       mean = mean.val, std.dev = std.dev.val)
R> options("ore.warn.order" = FALSE)
R> head(res, 3)
   Min. X1st.Qu. Median   Mean X3rd.Qu.  Max. Index
1 67.56    93.11  99.42  99.30    105.8 128.0   847
2 67.73    94.19  99.86 100.10    106.3 130.7   258
3 65.58    93.15  99.78  99.82    106.2 134.3   264
R> tail(res, 3)
   Min. X1st.Qu. Median   Mean X3rd.Qu.  Max. Index
1 65.02    93.44  100.2 100.20    106.9 134.0     5
2 71.60    93.34   99.6  99.66    106.4 131.7     4
3 69.44    93.15  100.3 100.10    106.8 135.2     3
R> boxplot(ore.pull(res[, 1:6]),
+   main=sprintf("Boxplot of %d rnorm samples size %d, mean=%d, sd=%d",
+                num.simulations, sample.size, mean.val, std.dev.val))