Spatial Fixed Effects

The spatial fixed effects algorithm computes an intercept or constant parameter for each regime, while the other model parameters remain constant. It is a simplified version of the spatial regimes algorithm.

The SpatialFixedEffectsRegressor class consists of regression models where each model has a different constant parameter, one for each regime. The rest of the parameters of the models are the same. To predict new values, it gets the constant parameter for the corresponding regime internally and uses that parameter in the regression equation along with the other parameters. You can also pass in the spatial_weights_definition parameter to obtain spatial diagnostics for analyzing the input features and fine tune the model.

The following table describes the main methods of the SpatialFixedEffectsRegressor class.

Method Description
fit The regime parameter indicates the categorical variable used as regime. The intercept parameter of the linear equation is different for each regime, while the rest of the parameters remain constant.
predict To predict new values, the algorithm gets the intercept of the linear equation from the corresponding regime (according to the regime parameter), and uses it along with the other parameters.
fit_predict Calls the fit and predict methods sequentially with the training data.
score Returns the R-squared statistic for the given data. For each observation, it uses the intercept associated with the corresponding regime, according to the regime parameter.

When creating an instance of the SpatialFixedEffectsRegresssor class, it is possible to define the spatial_weights_definition parameter to obtain spatial diagnostics after training the model.

See the SpatialFixedEffectsRegressor class in Python API Reference for Oracle Spatial AI for more information.

The following example uses the block_groups SpatialDataFrame and the functions defined in Spatial Regimes to create the regimes by splitting the geographical area into a grid, where each cell represents a regime.

Then, trains the Spatial Fixed Effects model. Finally, using the test set, it calls the predict and score methods to estimate the target variable and the R-squared metric respectively.

from oraclesai.preprocessing import spatial_train_test_split 
from oraclesai.weights import KNNWeightsDefinition 
from oraclesai.regression import SpatialFixedEffectsRegressor 
from oraclesai.pipeline import SpatialPipeline 
from sklearn.preprocessing import StandardScaler 

# Create a categorical variable by splitting the geographic region in a grid 
block_groups_grid = create_grid(block_groups, "GRID_ID", nrows=3, ncols=3) 

# Define the explanatory variables 
X = block_groups_grid[['MEDIAN_INCOME', 'MEAN_AGE', 'MEAN_EDUCATION_LEVEL', 'HOUSE_VALUE', 'INTERNET', 'GRID_ID', 'geometry']] 

# Define the training and test sets 
X_train, X_test, _, _, _, _ = spatial_train_test_split(X, y="MEDIAN_INCOME", test_size=0.2, random_state=32) 

# Get the regime values 
regimes_train = X_train["GRID_ID"].values.tolist() 
regimes_test = X_test["GRID_ID"].values.tolist() 

# Discard the categorical variable 
X_train = X_train.drop("GRID_ID") 
X_test = X_test.drop("GRID_ID") 

# Define the spatial weights 
weights_definition = KNNWeightsDefinition(k=10) 

# Create a Spatial Fixed Effects Regressor model 
sfe_model = SpatialFixedEffectsRegressor(spatial_weights_definition=weights_definition) 

# Add the model to a spatial pipeline along with a preprocessing step 
sfe_pipeline = SpatialPipeline([('scale', StandardScaler()), ('sfe', sfe_model)]) 

# Train the model using "MEDIAN_INCOME" as the target variable and specifying the regimes 
sfe_pipeline.fit(X_train, "MEDIAN_INCOME", sfe__regimes=regimes_train) 

# Print the predictions with the test set 
sfe_predictions_test = sfe_pipeline.predict(X_test.drop(["MEDIAN_INCOME"]), sfe__regimes=regimes_test).flatten() 
print(f"\n>> predictions (X_test):\n {sfe_predictions_test[:10]}") 

# Print the score with the test set 
sfe_r2_score = sfe_pipeline.score(X_test, y="MEDIAN_INCOME", sfe__regimes=regimes_test) 
print(f"\n>> r2_score (X_test):\n {sfe_r2_score}")

The program prints the predictions of the target variable of the first 10 observations, and the R-squared metric for the test set as shown:

>> predictions (X_test):
 [101512.84282764 109422.92724391  29615.01694646  29230.32429018
 162356.33498145  53108.14145735 105985.63259313  28588.56284749
  81056.36661461  19790.46314804]

>> r2_score (X_test):
 0.6701128016747615

The intercept values for each regime can be visualized using the summary property, and if the spatial_weights_definition parameter was defined when creating the regressor, the summary also includes spatial statistics, such as the Moran’s I and Lagrange Multipliers for spatial lag and spatial error.

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES - REGIMES
---------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     dep_var                Number of Observations:        2750
Mean dependent var  :  69703.4815                Number of Variables   :          12
S.D. dependent var  :  39838.5789                Degrees of Freedom    :        2738
R-squared           :      0.6573
Adjusted R-squared  :      0.6559
Sum squared residual:1495203246049.754                F-statistic           :    477.4024
Sigma-square        :546093223.539                Prob(F-statistic)     :           0
S.E. of regression  :   23368.638                Log likelihood        :  -31558.731
Sigma-square ML     :543710271.291                Akaike info criterion :   63141.461
S.E of regression ML:  23317.5957                Schwarz criterion     :   63212.494

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
          1_CONSTANT    75646.5430042    1406.0974938      53.7989317       0.0000000
          2_CONSTANT    77794.0850074    1338.3185516      58.1282273       0.0000000
          3_CONSTANT    58981.5644323    1948.7462992      30.2664151       0.0000000
          4_CONSTANT    60320.9906786    1002.6995461      60.1585898       0.0000000
          5_CONSTANT    69884.3635458    1076.5155202      64.9171909       0.0000000
          6_CONSTANT    75355.5269590    1338.6764983      56.2910659       0.0000000
          7_CONSTANT    71531.4267958    1445.6625603      49.4800300       0.0000000
          8_CONSTANT    72960.0800416    1983.5523209      36.7825337       0.0000000
    _Global_MEAN_AGE    2989.5036511     583.1586204       5.1263988       0.0000003
_Global_MEAN_EDUCATION_LEVEL    6304.4360113     904.9392927       6.9666950       0.0000000
 _Global_HOUSE_VALUE    21452.9209086     664.4420803      32.2871196       0.0000000
    _Global_INTERNET    8352.1786588     664.9940434      12.5597797       0.0000000
------------------------------------------------------------------------------------
Regimes variable: unknown

REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER            4.274

TEST ON NORMALITY OF ERRORS
TEST                             DF        VALUE           PROB
Jarque-Bera                       2        1415.811           0.0000

DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST                             DF        VALUE           PROB
Breusch-Pagan test               11        1252.140           0.0000
Koenker-Bassett test             11         486.455           0.0000

DIAGNOSTICS FOR SPATIAL DEPENDENCE
TEST                           MI/DF       VALUE           PROB
Moran's I (error)              0.2201        27.742           0.0000
Lagrange Multiplier (lag)         1         317.696           0.0000
Robust LM (lag)                   1           1.495           0.2214
Lagrange Multiplier (error)       1         722.582           0.0000
Robust LM (error)                 1         406.382           0.0000
Lagrange Multiplier (SARMA)       2         724.078           0.0000


REGIMES DIAGNOSTICS - CHOW TEST
                 VARIABLE        DF        VALUE           PROB
                 CONSTANT         7         184.738           0.0000
              Global test         7         184.738           0.0000
================================ END OF REPORT =====================================