Spatial Fixed Effects
The spatial fixed effects algorithm computes an intercept or constant parameter for each regime, while the other model parameters remain constant. It is a simplified version of the spatial regimes algorithm.
The SpatialFixedEffectsRegressor
class consists of regression models
where each model has a different constant parameter, one for each regime. The rest of
the parameters of the models are the same. To predict new values, it gets the constant
parameter for the corresponding regime internally and uses that parameter in the
regression equation along with the other parameters. You can also pass in the
spatial_weights_definition
parameter to obtain spatial diagnostics
for analyzing the input features and fine tune the model.
The following table describes the main methods of the
SpatialFixedEffectsRegressor
class.
Method | Description |
---|---|
fit |
The regime parameter indicates the
categorical variable used as regime. The intercept parameter of the
linear equation is different for each regime, while the rest of the
parameters remain constant.
|
predict |
To predict new values, the algorithm gets the intercept
of the linear equation from the corresponding regime (according to the
regime parameter), and uses it along with the other
parameters.
|
fit_predict |
Calls the fit and
predict methods sequentially with the training
data.
|
score |
Returns the R-squared statistic for the given data. For
each observation, it uses the intercept associated with the
corresponding regime, according to the regime
parameter.
|
When creating an instance of the SpatialFixedEffectsRegresssor
class, it
is possible to define the spatial_weights_definition
parameter to
obtain spatial diagnostics after training the model.
See the SpatialFixedEffectsRegressor class in Python API Reference for Oracle Spatial AI for more information.
The following example uses the block_groups
SpatialDataFrame
and the functions defined in Spatial Regimes to create the regimes by splitting the geographical area into a grid, where each cell
represents a regime.
Then, trains the Spatial Fixed Effects
model. Finally, using the test
set, it calls the predict
and score
methods to
estimate the target variable and the R-squared metric respectively.
from oraclesai.preprocessing import spatial_train_test_split
from oraclesai.weights import KNNWeightsDefinition
from oraclesai.regression import SpatialFixedEffectsRegressor
from oraclesai.pipeline import SpatialPipeline
from sklearn.preprocessing import StandardScaler
# Create a categorical variable by splitting the geographic region in a grid
block_groups_grid = create_grid(block_groups, "GRID_ID", nrows=3, ncols=3)
# Define the explanatory variables
X = block_groups_grid[['MEDIAN_INCOME', 'MEAN_AGE', 'MEAN_EDUCATION_LEVEL', 'HOUSE_VALUE', 'INTERNET', 'GRID_ID', 'geometry']]
# Define the training and test sets
X_train, X_test, _, _, _, _ = spatial_train_test_split(X, y="MEDIAN_INCOME", test_size=0.2, random_state=32)
# Get the regime values
regimes_train = X_train["GRID_ID"].values.tolist()
regimes_test = X_test["GRID_ID"].values.tolist()
# Discard the categorical variable
X_train = X_train.drop("GRID_ID")
X_test = X_test.drop("GRID_ID")
# Define the spatial weights
weights_definition = KNNWeightsDefinition(k=10)
# Create a Spatial Fixed Effects Regressor model
sfe_model = SpatialFixedEffectsRegressor(spatial_weights_definition=weights_definition)
# Add the model to a spatial pipeline along with a preprocessing step
sfe_pipeline = SpatialPipeline([('scale', StandardScaler()), ('sfe', sfe_model)])
# Train the model using "MEDIAN_INCOME" as the target variable and specifying the regimes
sfe_pipeline.fit(X_train, "MEDIAN_INCOME", sfe__regimes=regimes_train)
# Print the predictions with the test set
sfe_predictions_test = sfe_pipeline.predict(X_test.drop(["MEDIAN_INCOME"]), sfe__regimes=regimes_test).flatten()
print(f"\n>> predictions (X_test):\n {sfe_predictions_test[:10]}")
# Print the score with the test set
sfe_r2_score = sfe_pipeline.score(X_test, y="MEDIAN_INCOME", sfe__regimes=regimes_test)
print(f"\n>> r2_score (X_test):\n {sfe_r2_score}")
The program prints the predictions of the target variable of the first 10 observations, and the R-squared metric for the test set as shown:
>> predictions (X_test):
[101512.84282764 109422.92724391 29615.01694646 29230.32429018
162356.33498145 53108.14145735 105985.63259313 28588.56284749
81056.36661461 19790.46314804]
>> r2_score (X_test):
0.6701128016747615
The intercept values for each regime can be visualized using the summary
property, and if the spatial_weights_definition
parameter was defined
when creating the regressor, the summary also includes spatial statistics, such as the
Moran’s I and Lagrange Multipliers for spatial lag and spatial error.
REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES - REGIMES
---------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable : dep_var Number of Observations: 2750
Mean dependent var : 69703.4815 Number of Variables : 12
S.D. dependent var : 39838.5789 Degrees of Freedom : 2738
R-squared : 0.6573
Adjusted R-squared : 0.6559
Sum squared residual:1495203246049.754 F-statistic : 477.4024
Sigma-square :546093223.539 Prob(F-statistic) : 0
S.E. of regression : 23368.638 Log likelihood : -31558.731
Sigma-square ML :543710271.291 Akaike info criterion : 63141.461
S.E of regression ML: 23317.5957 Schwarz criterion : 63212.494
------------------------------------------------------------------------------------
Variable Coefficient Std.Error t-Statistic Probability
------------------------------------------------------------------------------------
1_CONSTANT 75646.5430042 1406.0974938 53.7989317 0.0000000
2_CONSTANT 77794.0850074 1338.3185516 58.1282273 0.0000000
3_CONSTANT 58981.5644323 1948.7462992 30.2664151 0.0000000
4_CONSTANT 60320.9906786 1002.6995461 60.1585898 0.0000000
5_CONSTANT 69884.3635458 1076.5155202 64.9171909 0.0000000
6_CONSTANT 75355.5269590 1338.6764983 56.2910659 0.0000000
7_CONSTANT 71531.4267958 1445.6625603 49.4800300 0.0000000
8_CONSTANT 72960.0800416 1983.5523209 36.7825337 0.0000000
_Global_MEAN_AGE 2989.5036511 583.1586204 5.1263988 0.0000003
_Global_MEAN_EDUCATION_LEVEL 6304.4360113 904.9392927 6.9666950 0.0000000
_Global_HOUSE_VALUE 21452.9209086 664.4420803 32.2871196 0.0000000
_Global_INTERNET 8352.1786588 664.9940434 12.5597797 0.0000000
------------------------------------------------------------------------------------
Regimes variable: unknown
REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER 4.274
TEST ON NORMALITY OF ERRORS
TEST DF VALUE PROB
Jarque-Bera 2 1415.811 0.0000
DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST DF VALUE PROB
Breusch-Pagan test 11 1252.140 0.0000
Koenker-Bassett test 11 486.455 0.0000
DIAGNOSTICS FOR SPATIAL DEPENDENCE
TEST MI/DF VALUE PROB
Moran's I (error) 0.2201 27.742 0.0000
Lagrange Multiplier (lag) 1 317.696 0.0000
Robust LM (lag) 1 1.495 0.2214
Lagrange Multiplier (error) 1 722.582 0.0000
Robust LM (error) 1 406.382 0.0000
Lagrange Multiplier (SARMA) 2 724.078 0.0000
REGIMES DIAGNOSTICS - CHOW TEST
VARIABLE DF VALUE PROB
CONSTANT 7 184.738 0.0000
Global test 7 184.738 0.0000
================================ END OF REPORT =====================================