GWR Classifier

The Geographically Weighted Regression (GWR) classifier is a binary classifier used in the presence of spatial heterogeneity, which can be identified as a sign of regional variation.

The algorithm creates a local classifier for every observation in the dataset by incorporating the target and explanatory variables from the observations within their neighborhood, allowing the relationships between the independent and dependent variables to vary by locality.

The classifier trains a logistic regression model for every sample in the dataset, incorporating the dependent and independent variables of locations falling within a specified bandwidth. The goal is to maximize the cross-entropy loss function defined as follows.

Description of the illustration cross_entropy_function.png

In the preceding function, y is either 0 or 1, the function h is the activation function for Logistic Regression, which is the Sigmoid function.

The following table describes the main methods of the GWRClassifier class.

Method	Description
`fit`	The algorithm requires a bandwidth, which can be set by the user with the `bandwidth` parameter or by specifying the `spatial_weights_definition` parameter. If the `bandwidth` parameter is defined, the algorithm ignores the bandwidth associated with the spatial weights. The bandwidth can be either a threshold distance or a value of k for the K-Nearest Neighbors method. If neither the `bandwidth` nor the `spatial_weights_definition` parameters are defined, then the bandwidth is estimated internally based on the geometries.
`predict`	To make predictions, GWR trains a model for each observation on the prediction set using neighboring observations from the training data. Then, it uses those models to estimate the target variable.
`fit_predict`	Calls the `fit` and `predict` methods sequentially with the training data.
`score`	Returns the model's accuracy for the given data.

See the GWRClassifier class in Python API Reference for Oracle Spatial AI for more information.

The following example uses the block_groups SpatialDataFrame and performs the following steps:

Creates a categorical variable based on the MEDIAN_INCOME column to be used as the target variable.
Creates an instance of GWRClassifier.
Trains the model using a training set.
Prints the predictions from the model and the model's accuracy using the trained model.

import pandas as pd 
from oraclesai.preprocessing import spatial_train_test_split 
from oraclesai.weights import DistanceBandWeightsDefinition 
from oraclesai.classification import GWRClassifier 
from oraclesai.pipeline import SpatialPipeline 
from sklearn.preprocessing import StandardScaler 

# Create a categorical variable, "INCOME_LABEL", based on the second quantile of the median income 
block_groups_extended = block_groups.add_column("INCOME_LABEL", pd.qcut(block_groups['MEDIAN_INCOME'].values, [0, 0.5, 1], labels=[0, 1]).to_list()) 

# Set a referenced coordinate system 
block_groups_extended = block_groups_extended.to_crs('epsg:3857') 

# Define the target and explanatory variables 
X = block_groups_extended[['INCOME_LABEL', 'MEAN_AGE', 'MEAN_EDUCATION_LEVEL', 'HOUSE_VALUE', 'INTERNET', 'geometry']] 

# Define the training and test sets 
X_train, X_test, _, _, _, _ = spatial_train_test_split(X, y="median_income", test_size=0.2, random_state=32) 

# Define the spatial weights definition 
weights_definition = DistanceBandWeightsDefinition(threshold=15000) 

# Create an instance of GWRClassifier 
gwr_classifier = GWRClassifier(spatial_weights_definition=weights_definition) 

# Add the model to a spatial pipeline along with a pre-processing step 
classifier_pipeline = SpatialPipeline([('scale', StandardScaler()), ('gwr', gwr_classifier)]) 

# Train the model specifying the target variable 
classifier_pipeline.fit(X_train, "INCOME_LABEL") 

# Print the predictions with the test set 
gwr_predictions_test = classifier_pipeline.predict(X_test.drop("INCOME_LABEL")).flatten() 
print(f"\n>> predictions (X_test):\n {gwr_predictions_test[:10]}") 

# Print the accuracy with the test set 
gwr_accuracy_test = classifier_pipeline.score(X_test, "INCOME_LABEL") 
print(f"\n>> accuracy (X_test):\n {gwr_accuracy_test}")

The output consists of the predictions of the first 10 observations and the model's accuracy using the test set.

>> predictions (X_test):
 [1 1 0 0 1 0 1 0 0 0]

>> accuracy (X_test):
 0.8384279475982532

The summary property includes statistics of a global logistic regression and the GWRClassifier. As for the estimated parameters, it displays the average value from all the local models.

===========================================================================
Model type                                                         Binomial
Number of observations:                                                2750
Number of covariates:                                                     5

Global Regression Results
---------------------------------------------------------------------------
Deviance:                                                          2088.938
Log-likelihood:                                                   -1044.469
AIC:                                                               2098.938
AICc:                                                              2098.960
BIC:                                                             -19649.694
Percent deviance explained:                                           0.452
Adj. percent deviance explained:                                      0.451

Variable                              Est.         SE  t(Est/SE)    p-value
------------------------------- ---------- ---------- ---------- ----------
X0                                  -0.044      0.061     -0.717      0.473
X1                                   0.439      0.072      6.084      0.000
X2                                   0.685      0.104      6.603      0.000
X3                                   0.542      0.109      4.989      0.000
X4                                   1.298      0.092     14.088      0.000

Geographically Weighted Regression (GWR) Results
---------------------------------------------------------------------------
Spatial kernel:                                          Fixed bisquare
Bandwidth used:                                                   15000.000

Diagnostic information
---------------------------------------------------------------------------
Effective number of parameters (trace(S)):                           56.675
Degree of freedom (n - trace(S)):                                  2693.325
Log-likelihood:                                                    -888.994
AIC:                                                               1891.337
AICc:                                                              1893.765
BIC:                                                               2226.816
Percent deviance explained:                                         0.534
Adjusted percent deviance explained:                                0.524
Adj. alpha (95%):                                                     0.004
Adj. critical t value (95%):                                          2.850

Summary Statistics For GWR Parameter Estimates
---------------------------------------------------------------------------
Variable                   Mean        STD        Min     Median        Max
-------------------- ---------- ---------- ---------- ---------- ----------
X0                       -0.020      0.846     -1.630     -0.140      3.328
X1                        0.512      0.325      0.020      0.385      2.156
X2                        0.931      0.665     -1.213      1.168      2.893
X3                        0.995      0.981     -0.615      0.834      6.249
X4                        1.190      0.356      0.324      1.119      2.531
===========================================================================