6.3 Build Linear Regression Models
The ore.lm
and ore.stepwise
functions perform least squares regression and stepwise least squares regression, respectively, on data represented in an ore.frame
object.
A model fit is generated using embedded R map/reduce operations where the map operation creates either QR decompositions or matrix cross-products depending on the number of coefficients being estimated. The underlying model matrices are created using either a model.matrix
or sparse.model.matrix
object depending on the sparsity of the model. Once the coefficients for the model have been estimated another pass of the data is made to estimate the model-level statistics.
When forward, backward, or stepwise selection is performed, the XtX and Xty matrices are subsetted to generate the F-test p-values based upon coefficient estimates that were generated using a Choleski decomposition of the XtX subset matrix.
If there are collinear terms in the model, functions ore.lm
and ore.stepwise
do not estimate the coefficient values for a collinear set of terms. For ore.stepwise
, a collinear set of terms is excluded throughout the procedure.
For more information on ore.lm
and ore.stepwise
, invoke help(ore.lm)
.
Example 6-2 Using ore.lm
This example pushes the longley
data set to a temporary database table that has the proxy ore.frame
object longley_of
. The example builds a linear regression model using ore.lm
.
longley_of <- ore.push(longley) # Fit full model oreFit1 <- ore.lm(Employed ~ ., data = longley_of) class(oreFit1) summary(oreFit1)
Listing for This Example
R> longley_of <- ore.push(longley)
R> # Fit full model
R> oreFit1 <- ore.lm(Employed ~ ., data = longley_of)
R> class(oreFit1)
[1] "ore.lm" "ore.model" "lm"
R> summary(oreFit1)
Call:
ore.lm(formula = Employed ~ ., data = longley_of)
Residuals:
Min 1Q Median 3Q Max
-0.41011 -0.15767 -0.02816 0.10155 0.45539
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.482e+03 8.904e+02 -3.911 0.003560 **
GNP.deflator 1.506e-02 8.492e-02 0.177 0.863141
GNP -3.582e-02 3.349e-02 -1.070 0.312681
Unemployed -2.020e-02 4.884e-03 -4.136 0.002535 **
Armed.Forces -1.033e-02 2.143e-03 -4.822 0.000944 ***
Population -5.110e-02 2.261e-01 -0.226 0.826212
Year 1.829e+00 4.555e-01 4.016 0.003037 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3049 on 9 degrees of freedom
Multiple R-squared: 0.9955, Adjusted R-squared: 0.9925
F-statistic: 330.3 on 6 and 9 DF, p-value: 4.984e-10
Example 6-3 Using the ore.stepwise Function
This example pushes the longley
data set to a temporary database table that has the proxy ore.frame
object longley_of
. The example builds linear regression models using the ore.stepwise
function.
longley_of <- ore.push(longley) # Two stepwise alternatives oreStep1 <- ore.stepwise(Employed ~ .^2, data = longley_of, add.p = 0.1, drop.p = 0.1) oreStep2 <- step(ore.lm(Employed ~ 1, data = longley_of), scope = terms(Employed ~ .^2, data = longley_of))
Listing for This Example
R> longley_of <- ore.push(longley)
R> # Two stepwise alternatives
R> oreStep1 <-
+ ore.stepwise(Employed ~ .^2, data = longley_of, add.p = 0.1, drop.p = 0.1)
R> oreStep2 <-
+ step(ore.lm(Employed ~ 1, data = longley_of),
+ scope = terms(Employed ~ .^2, data = longley_of))
Start: AIC=41.17
Employed ~ 1
Df Sum of Sq RSS AIC
+ GNP 1 178.973 6.036 -11.597
+ Year 1 174.552 10.457 -2.806
+ GNP.deflator 1 174.397 10.611 -2.571
+ Population 1 170.643 14.366 2.276
+ Unemployed 1 46.716 138.293 38.509
+ Armed.Forces 1 38.691 146.318 39.411
<none> 185.009 41.165
Step: AIC=-11.6
Employed ~ GNP
Df Sum of Sq RSS AIC
+ Unemployed 1 2.457 3.579 -17.960
+ Population 1 2.162 3.874 -16.691
+ Year 1 1.125 4.911 -12.898
<none> 6.036 -11.597
+ GNP.deflator 1 0.212 5.824 -10.169
+ Armed.Forces 1 0.077 5.959 -9.802
- GNP 1 178.973 185.009 41.165
... The rest of the output is not shown.
Parent topic: Build Oracle Machine Learning for R Models