4.2.4 Cross-Tabulate Data
Cross-tabulation is a statistical technique that finds an interdependent relationship between two tables of values.
The ore.crosstab
function enables cross-column analysis of an ore.frame
. This function is a sophisticated variant of the R table
function.
You must use ore.crosstab
function before performing frequency analysis using ore.freq
.
If the result of the ore.crosstab
function invocation is a single cross-tabulation, then the function returns an ore.frame
object. If the result is multiple cross-tabulations, then the function returns a list of ore.frame
objects.
For details about function arguments, call help(ore.crosstab)
.
Example 4-31 Creating a Single Column Frequency Table
The most basic use case is to create a single-column frequency table, as shown in this example.
This example filters the NARROW ore.frame
, grouping by GENDER.
ct <- ore.crosstab(~AGE, data=NARROW) head(ct)
Listing for This Example
R> ct <- ore.crosstab(~AGE, data=NARROW) R> head(ct) AGE ORE$FREQ ORE$STRATA ORE$GROUP 17 17 14 1 1 18 18 16 1 1 19 19 30 1 1 20 20 23 1 1 21 21 22 1 1 22 22 39 1 1
Example 4-32 Analyzing Two Columns
This example analyses AGE by GENDER and AGE by CLASS.
ct <- ore.crosstab(AGE~GENDER+CLASS, data=NARROW) head(ct)
Listing for This Example
R> ct <- ore.crosstab(AGE~GENDER+CLASS, data=NARROW) R> head(ct) $`AGE~GENDER` AGE GENDER ORE$FREQ ORE$STRATA ORE$GROUP 17|F 17 F 5 1 1 17|M 17 M 9 1 1 18|F 18 F 6 1 1 18|M 18 M 7 1 1 19|F 19 F 15 1 1 19|M 19 M 13 1 1 # The remaining output is not shown.
Example 4-33 Weighting Rows
To weight rows, include a count based on another column as shown in this example. This example weights values in AGE and GENDER using values in YRS_RESIDENCE.
ct <- ore.crosstab(AGE~GENDER*YRS_RESIDENCE, data=NARROW) head(ct)
Listing for This Example
R> ct <- ore.crosstab(AGE~GENDER*YRS_RESIDENCE, data=NARROW) R> head(ct) AGE GENDER ORE$FREQ ORE$STRATA ORE$GROUP 17|F 17 F 1 1 1 17|M 17 M 8 1 1 18|F 18 F 4 1 1 18|M 18 M 10 1 1 19|F 19 F 15 1 1 19|M 19 M 17 1 1
Example 4-34 Ordering Cross-Tabulated Data
There are several possibilities for ordering rows in a cross-tabulated table, such as the following:
-
Default or NAME orders by the columns being analyzed
-
FREQ orders by frequency counts
-
-NAME or -FREQ does reverse ordering
-
INTERNAL bypasses ordering
This example orders by frequency count and then by reverse order by frequency count.
ct <- ore.crosstab(AGE~GENDER|FREQ, data=NARROW) head(ct) ct <- ore.crosstab(AGE~GENDER|-FREQ, data=NARROW) head(ct)
Listing for This Example
R> ct <- ore.crosstab(AGE~GENDER|FREQ, data=NARROW) R> head(ct) AGE GENDER ORE$FREQ ORE$STRATA ORE$GROUP 66|F 66 F 1 1 1 70|F 70 F 1 1 1 73|M 73 M 1 1 1 74|M 74 M 1 1 1 76|F 76 F 1 1 1 77|F 77 F 1 1 1 R> ct <- ore.crosstab(AGE~GENDER|-FREQ, data=NARROW) R> head(ct) AGE GENDER ORE$FREQ ORE$STRATA ORE$GROUP 27|M 27 M 33 1 1 35|M 35 M 28 1 1 41|M 41 M 27 1 1 34|M 34 M 26 1 1 37|M 37 M 26 1 1 28|M 28 M 25 1 1
Example 4-35 Analyzing Three or More Columns
This example demonstrates analyzing three or more columns. The result is similar to what the SQL GROUPING SETS
clause accomplishes.
ct <- ore.crosstab(AGE+COUNTRY~GENDER, NARROW) head(ct)
Listing for This Example
R> ct <- ore.crosstab(AGE+COUNTRY~GENDER, NARROW) R> head(ct) $`AGE~GENDER` AGE GENDER ORE$FREQ ORE$STRATA ORE$GROUP 17|F 17 F 5 1 1 17|M 17 M 9 1 1 18|F 18 F 6 1 1 18|M 18 M 7 1 1 19|F 19 F 15 1 1 19|M 19 M 13 1 1 # The rest of the output is not shown. $`COUNTRY~GENDER` COUNTRY GENDER ORE$FREQ ORE$STRATA ORE$GROUP Argentina|F Argentina F 14 1 1 Argentina|M Argentina M 28 1 1 Australia|M Australia M 1 1 1 # The rest of the output is not shown.
Example 4-36 Specifying a Range of Columns
You can specify a range of columns instead of having to type all the column names, as demonstrated in this example.
names(NARROW) # Because AGE, MARITAL_STATUS and COUNTRY are successive columns, # you can simply do the following: ct <- ore.crosstab(AGE-COUNTRY~GENDER, NARROW) # An equivalent invocation is the following: ct <- ore.crosstab(AGE+MARITAL_STATUS+COUNTRY~GENDER, NARROW)
Listing for This Example
R> names(NARROW) [1] "ID" "GENDER" "AGE" "MARITAL_STATUS" [5] "COUNTRY" "EDUCATION" "OCCUPATION" "YRS_RESIDENCE" [9] "CLASS" R> # Because AGE, MARITAL_STATUS and COUNTRY are successive columns, R> # you can simply do the following: R> ct <- ore.crosstab(AGE-COUNTRY~GENDER, NARROW) R> # An equivalent invocation is the following: R> ct <- ore.crosstab(AGE+MARITAL_STATUS+COUNTRY~GENDER, NARROW)
Example 4-37 Producing One Cross-Tabulation Table for Each Value of Another Column
This example produces one cross-tabulation table (AGE, GENDER) for each unique value of another column COUNTRY.
ct <- ore.crosstab(~AGE/COUNTRY, data=NARROW) head(ct)
Listing for This Example
R> ct <- ore.crosstab(~AGE/COUNTRY, data=NARROW) R> head(ct) AGE ORE$FREQ ORE$STRATA ORE$GROUP Argentina|17 17 1 1 1 Brazil|17 17 1 1 3 United States of America|17 17 12 1 19 United States of America|18 18 16 1 19 United States of America|19 19 30 1 19 United States of America|20 20 23 1 19
Example 4-38 Producing One Cross-Tabulation Table for Each Set of Value of Two Columns
You can extend the cross-tabulation to more than one column, as shown in this example, which produces one (AGE, EDUCATION) table for each unique combination of (COUNTRY, GENDER).
ct <- ore.crosstab(AGE~EDUCATION/COUNTRY+GENDER, data=NARROW) head(ct)
Listing for This Example
R> ct <- ore.crosstab(AGE~EDUCATION/COUNTRY+GENDER, data=NARROW) R> head(ct) AGE EDUCATION ORE$FREQ ORE$STRATA ORE$GROUP United States of America|F|17|10th 17 10th 3 1 33 United States of America|M|17|10th 17 10th 5 1 34 United States of America|M|17|11th 17 11th 1 1 34 Argentina|M|17|HS-grad 17 HS-grad 1 1 2 United States of America|M|18|10th 18 10th 1 1 34 United States of America|F|18|11th 18 11th 2 1 33
Example 4-39 Augmenting Cross-Tabulation with Stratification
All of the cross-tabulation tables in the previous examples can be augmented with stratification, as shown in this example.
ct <- ore.crosstab(AGE~GENDER^CLASS, data=NARROW) head(ct) R> head(ct) # The previous function invocation is the same as the following: ct <- ore.crosstab(AGE~GENDER, NARROW, strata="CLASS")
Listing for This Example
R> ct <- ore.crosstab(AGE~GENDER^CLASS, data=NARROW) R> head(ct) R> head(ct) AGE GENDER ORE$FREQ ORE$STRATA ORE$GROUP 0|17|F 17 F 5 1 1 0|17|M 17 M 9 1 1 0|18|F 18 F 6 1 1 0|18|M 18 M 7 1 1 0|19|F 19 F 15 1 1 0|19|M 19 M 13 1 1 # The previous function invocation is the same as the following: ct <- ore.crosstab(AGE~GENDER, NARROW, strata="CLASS")
Example 4-40 Binning Followed by Cross-Tabulation
This example does a custom binning by AGE and then calculates the cross-tabulation for GENDER and the bins.
NARROW$AGEBINS <- ifelse(NARROW$AGE<20, 1, ifelse(NARROW$AGE<30,2, ifelse(NARROW$AGE<40,3,4))) ore.crosstab(GENDER~AGEBINS, NARROW)
Listing for This Example
R> NARROW$AGEBINS <- ifelse(NARROW$AGE<20, 1, ifelse(NARROW$AGE<30,2, + ifelse(NARROW$AGE<40,3,4))) R> ore.crosstab(GENDER~AGEBINS, NARROW) GENDER AGEBINS ORE$FREQ ORE$STRATA ORE$GROUP F|1 F 1 26 1 1 F|2 F 2 108 1 1 F|3 F 3 86 1 1 F|4 F 4 164 1 1 M|1 M 1 29 1 1 M|2 M 2 177 1 1 M|3 M 3 230 1 1 M|4 M 4 381 1 1
Parent topic: Explore Data