5.2.3 Correlate Data
You can use the ore.corr
function to perform correlation analysis.
With the ore.corr
function, you can do the following:
-
Perform Pearson, Spearman or Kendall correlation analysis across numeric columns in an
ore.frame
object. -
Perform partial correlations by specifying a control column.
-
Aggregate some data prior to the correlations.
-
Post-process results and integrate them into an R code flow.
You can make the output of the
ore.corr
function conform to the output of the Rcor
function; doing so allows you to use any R function to post-process the output or to use the output as the input to a graphics function.
For details about the function arguments, call help(ore.corr)
.
The following examples demonstrate these operations.
Example 5-29 Performing Basic Correlation Calculations
This example demonstrates how to specify the different types of correlation statistics.
# Before performing correlations, project out all non-numeric values # by specifying only the columns that have numeric values. names(NARROW) NARROW_NUMS <- NARROW[,c(3,8,9)] names(NARROW_NUMS) # Calculate the correlation using the default correlation statistic, Pearson. x <- ore.corr(NARROW_NUMS,var='AGE,YRS_RESIDENCE,CLASS') head(x, 3) # Calculate using Spearman. x <- ore.corr(NARROW_NUMS,var='AGE,YRS_RESIDENCE,CLASS', stats='spearman') head(x, 3) # Calculate using Kendall x <- ore.corr(NARROW_NUMS,var='AGE,YRS_RESIDENCE,CLASS', stats='kendall') head(x, 3)
Listing for This Example
R> # Before performing correlations, project out all non-numeric values
R> # by specifying only the columns that have numeric values.
R> names(NARROW)
[1] "ID" "GENDER" "AGE" "MARITAL_STATUS" "COUNTRY" "EDUCATION" "OCCUPATION"
[8] "YRS_RESIDENCE" "CLASS" "AGEBINS"
R> NARROW_NUMS <- NARROW[,c(3,8,9)]
R> names(NARROW_NUMS)
[1] "AGE" "YRS_RESIDENCE" "CLASS"
R> # Calculate the correlation using the default correlation statistic, Pearson.
R> x <- ore.corr(NARROW_NUMS,var='AGE,YRS_RESIDENCE,CLASS')
R> head(x, 3)
ROW COL PEARSON_T PEARSON_P PEARSON_DF
1 AGE CLASS 0.2200960 1e-15 1298
2 AGE YRS_RESIDENCE 0.6568534 0e+00 1098
3 YRS_RESIDENCE CLASS 0.3561869 0e+00 1298
R> # Calculate using Spearman.
R> x <- ore.corr(NARROW_NUMS,var='AGE,YRS_RESIDENCE,CLASS', stats='spearman')
R> head(x, 3)
ROW COL SPEARMAN_T SPEARMAN_P SPEARMAN_DF
1 AGE CLASS 0.2601221 1e-15 1298
2 AGE YRS_RESIDENCE 0.7462684 0e+00 1098
3 YRS_RESIDENCE CLASS 0.3835252 0e+00 1298
R> # Calculate using Kendall
R> x <- ore.corr(NARROW_NUMS,var='AGE,YRS_RESIDENCE,CLASS', stats='kendall')
R> head(x, 3)
ROW COL KENDALL_T KENDALL_P KENDALL_DF
1 AGE CLASS 0.2147107 4.285594e-31 <NA>
2 AGE YRS_RESIDENCE 0.6332196 0.000000e+00 <NA>
3 YRS_RESIDENCE CLASS 0.3362078 1.094478e-73 <NA>
Example 5-30 Creating Correlation Matrices
This example pushes the iris
data set to a temporary table in the database, which has the proxy ore.frame
object iris_of
. It creates correlation matrices grouped by species.
iris_of <- ore.push(iris) x <- ore.corr(iris_of, var = "Sepal.Length, Sepal.Width, Petal.Length", partial = "Petal.Width", group.by = "Species") class(x) head(x)
Listing for This Example
R> iris_of <- ore.push(iris)
R> x <- ore.corr(iris_of, var = "Sepal.Length, Sepal.Width, Petal.Length",
+ partial = "Petal.Width", group.by = "Species")
R> class(x)
[1] "list"
R> head(x)
$setosa
ROW COL PART_PEARSON_T PART_PEARSON_P PART_PEARSON_DF
1 Sepal.Length Petal.Length 0.1930601 9.191136e-02 47
2 Sepal.Length Sepal.Width 0.7255823 1.840300e-09 47
3 Sepal.Width Petal.Length 0.1095503 2.268336e-01 47
$versicolor
ROW COL PART_PEARSON_T PART_PEARSON_P PART_PEARSON_DF
1 Sepal.Length Petal.Length 0.62696041 7.180100e-07 47
2 Sepal.Length Sepal.Width 0.26039166 3.538109e-02 47
3 Sepal.Width Petal.Length 0.08269662 2.860704e-01 47
$virginica
ROW COL PART_PEARSON_T PART_PEARSON_P PART_PEARSON_DF
1 Sepal.Length Petal.Length 0.8515725 4.000000e-15 47
2 Sepal.Length Sepal.Width 0.3782728 3.681795e-03 47
3 Sepal.Width Petal.Length 0.2854459 2.339940e-02 47
Parent topic: Explore Data