4.2.3 Correlate Data
You can use the ore.corr
function to perform correlation analysis.
With the ore.corr
function, you can do the following:
-
Perform Pearson, Spearman or Kendall correlation analysis across numeric columns in an
ore.frame
object. -
Perform partial correlations by specifying a control column.
-
Aggregate some data prior to the correlations.
-
Post-process results and integrate them into an R code flow.
You can make the output of the
ore.corr
function conform to the output of the Rcor
function; doing so allows you to use any R function to post-process the output or to use the output as the input to a graphics function.
For details about the function arguments, call help(ore.corr)
.
The following examples demonstrate these operations.
Example 4-29 Performing Basic Correlation Calculations
This example demonstrates how to specify the different types of correlation statistics.
# Before performing correlations, project out all non-numeric values # by specifying only the columns that have numeric values. names(NARROW) NARROW_NUMS <- NARROW[,c(3,8,9)] names(NARROW_NUMS) # Calculate the correlation using the default correlation statistic, Pearson. x <- ore.corr(NARROW_NUMS,var='AGE,YRS_RESIDENCE,CLASS') head(x, 3) # Calculate using Spearman. x <- ore.corr(NARROW_NUMS,var='AGE,YRS_RESIDENCE,CLASS', stats='spearman') head(x, 3) # Calculate using Kendall x <- ore.corr(NARROW_NUMS,var='AGE,YRS_RESIDENCE,CLASS', stats='kendall') head(x, 3)
Listing for This Example
R> # Before performing correlations, project out all non-numeric values R> # by specifying only the columns that have numeric values. R> names(NARROW) [1] "ID" "GENDER" "AGE" "MARITAL_STATUS" "COUNTRY" "EDUCATION" "OCCUPATION" [8] "YRS_RESIDENCE" "CLASS" "AGEBINS" R> NARROW_NUMS <- NARROW[,c(3,8,9)] R> names(NARROW_NUMS) [1] "AGE" "YRS_RESIDENCE" "CLASS" R> # Calculate the correlation using the default correlation statistic, Pearson. R> x <- ore.corr(NARROW_NUMS,var='AGE,YRS_RESIDENCE,CLASS') R> head(x, 3) ROW COL PEARSON_T PEARSON_P PEARSON_DF 1 AGE CLASS 0.2200960 1e-15 1298 2 AGE YRS_RESIDENCE 0.6568534 0e+00 1098 3 YRS_RESIDENCE CLASS 0.3561869 0e+00 1298 R> # Calculate using Spearman. R> x <- ore.corr(NARROW_NUMS,var='AGE,YRS_RESIDENCE,CLASS', stats='spearman') R> head(x, 3) ROW COL SPEARMAN_T SPEARMAN_P SPEARMAN_DF 1 AGE CLASS 0.2601221 1e-15 1298 2 AGE YRS_RESIDENCE 0.7462684 0e+00 1098 3 YRS_RESIDENCE CLASS 0.3835252 0e+00 1298 R> # Calculate using Kendall R> x <- ore.corr(NARROW_NUMS,var='AGE,YRS_RESIDENCE,CLASS', stats='kendall') R> head(x, 3) ROW COL KENDALL_T KENDALL_P KENDALL_DF 1 AGE CLASS 0.2147107 4.285594e-31 <NA> 2 AGE YRS_RESIDENCE 0.6332196 0.000000e+00 <NA> 3 YRS_RESIDENCE CLASS 0.3362078 1.094478e-73 <NA>
Example 4-30 Creating Correlation Matrices
This example pushes the iris
data set to a temporary table in the database, which has the proxy ore.frame
object iris_of
. It creates correlation matrices grouped by species.
iris_of <- ore.push(iris) x <- ore.corr(iris_of, var = "Sepal.Length, Sepal.Width, Petal.Length", partial = "Petal.Width", group.by = "Species") class(x) head(x)
Listing for This Example
R> iris_of <- ore.push(iris) R> x <- ore.corr(iris_of, var = "Sepal.Length, Sepal.Width, Petal.Length", + partial = "Petal.Width", group.by = "Species") R> class(x) [1] "list" R> head(x) $setosa ROW COL PART_PEARSON_T PART_PEARSON_P PART_PEARSON_DF 1 Sepal.Length Petal.Length 0.1930601 9.191136e-02 47 2 Sepal.Length Sepal.Width 0.7255823 1.840300e-09 47 3 Sepal.Width Petal.Length 0.1095503 2.268336e-01 47 $versicolor ROW COL PART_PEARSON_T PART_PEARSON_P PART_PEARSON_DF 1 Sepal.Length Petal.Length 0.62696041 7.180100e-07 47 2 Sepal.Length Sepal.Width 0.26039166 3.538109e-02 47 3 Sepal.Width Petal.Length 0.08269662 2.860704e-01 47 $virginica ROW COL PART_PEARSON_T PART_PEARSON_P PART_PEARSON_DF 1 Sepal.Length Petal.Length 0.8515725 4.000000e-15 47 2 Sepal.Length Sepal.Width 0.3782728 3.681795e-03 47 3 Sepal.Width Petal.Length 0.2854459 2.339940e-02 47
Parent topic: Explore Data