5.1.5 Summarize Data with ore.summary
The ore.summary
function calculates descriptive statistics and supports extensive analysis of columns in an ore.frame
, along with flexible row aggregations.
The ore.summary
function supports these statistics:
-
Mean, minimum, maximum, mode, number of missing values, sum, weighted sum
-
Corrected and uncorrected sum of squares, range of values,
stddev
,stderr
,variance
-
t-test for testing the hypothesis that the population mean is 0
-
Kurtosis, skew, Coefficient of Variation
-
Quantiles: p1, p5, p10, p25, p50, p75, p90, p95, p99, qrange
-
1-sided and 2-sided Confidence Limits for the mean:
clm
,rclm
,lclm
-
Extreme value tagging
The ore.summary
function provides a relatively simple syntax compared with SQL queries that produce the same results.
The ore.summary
function returns an ore.frame
in all cases except when the group.by
argument is used. If the group.by
argument is used, then ore.summary
returns a list of ore.frame
objects, one ore.frame
per stratum.
For details about the function arguments, call help(ore.summary)
.
Example 5-6 Calculating Default Statistics
This example calculates the mean, minimum, and maximum values for columns AGE and CLASS and rolls up (aggregates) the GENDER column.
ore.summary(NARROW, class = 'GENDER', var = c('AGE', 'CLASS'), order = 'freq')
Example 5-7 Calculating Skew and Probability for t Test
This example calculates the skew of AGE and the probability of the Student's t distribution for CLASS.
ore.summary(NARROW, class = 'GENDER', var = c('AGE', 'CLASS'), c('skew', 'probt'))
Example 5-8 Calculating the Weighted Sum
This example calculates the weighted sum for AGE aggregated by GENDER with YRS_RESIDENCE as weights; in other words, it calculates sum(var*weight)
.
ore.summary(NARROW, class = 'GENDER', var = 'AGE', stats = 'sum', weight = 'YRS_RESIDENCE')
Example 5-9 Grouping by Two Columns
This example groups CLASS by GENDER and MARITAL_STATUS.
ore.summary(NARROW, class = c('GENDER', 'MARITAL_STATUS'), var = 'CLASS', ways = 1)
Example 5-10 Grouping by All Possible Ways
This example groups CLASS in all possible ways by GENDER and MARITAL_STATUS.
ore.summary(NARROW, class = c('GENDER', 'MARITAL_STATUS'), var = 'CLASS', ways = 0:length(NARROW['CLASS']))
Example 5-11 Getting the Maximum Values of Columns Using ore.summary
This example lists the maximum value and corresponding species of the Sepal.Length and Sepal.Width columns in the IRIS ore.frame
.
IRIS <- ore.push(iris)
ore.summary(IRIS, c("Sepal.Length", "Sepal.Width"),
"max",
maxid=c(Sepal.Length="Species", Sepal.Width="Species"))
Listing for This Example
R> IRIS <- ore.push(iris)
R> ore.summary(IRIS, c("Sepal.Length", "Sepal.Width"),
+ "max",
+ maxid=c(Sepal.Length="Species", Sepal.Width="Species"))
FREQ MAX(Sepal.Length) MAX(Sepal.Width) MAXID(Sepal.Length->Species) MAXID(Sepal.Width->Species)
1 150 7.9 4.4 virginica setosa
Warning message:
ORE object has no unique key - using random order