Geographical Classifier
Similar to GeographicalRegressor
, the
GeographicalClassifier
class trains a global model and multiple local
models and predicts by combining the weighted results from both models.
By defining the global_model
and model_cls
parameters, you can specify the scikit-learn
global and local
classifiers respectively. The classifiers can be any scikit-learn
classifiers, including Random Forest, Support Vector, Gradient Boosting, Decision Trees,
and so on.
Both, GeographicalClassifier
and
GeographicalRegressor
extend the Geographical Random Forest
algorithm by allowing the use of various underlying machine learning algorithms besides
Random Forest and supporting parallelism in the training of local models, ensuring
robust and scalable performance. See [4] for more information on the Geographical Random Forest
algorithm.
The following table describes the main methods of the Geographical
Classifier
class.
Method | Description |
---|---|
fit |
First, the global model is built using the parameters
provided at creation time. If the spatial relationship is not specified
(either by the spatial_weights_definition or the
bandwidth parameter), it is internally computed.
Then, several local models are trained.
|
predict |
The following steps describe the prediction method:
|
fit_predict |
Calls the fit and
predict methods sequentially with the training
data.
|
score |
Returns the model's accuracy for the given data. |
See the Geographical Classifier class in Python API Reference for Oracle Spatial AI for more information.
The following code uses the houses_full
SpatialDataFrame
, containing housing information for the city of
Los Angeles. The example performs the following steps:
- Creates a categorical variable based on the
HOUSE_VALUE_MEDIAN
column. - Defines the training and test sets.
- Creates an instance of
GeographicalClassifier
. - Trains the local model using the
RandomForestClassifier
fromscikit-learn
. - Calls the
predict
andscore
methods to estimate the target variable and the model’s accuracy of a test set respectively.
from oraclesai.preprocessing import spatial_train_test_split
from oraclesai.weights import DistanceBandWeightsDefinition
from sklearn.ensemble import RandomForestClassifier
from oraclesai.classification import GeographicalClassifier
# Define explanatory variables
feature_columns = [
'BEDROOMS_TOTAL',
'EDU_LEVEL_SCORE_MEDIAN',
'POPULATION_DENSITY',
'ROOMS_TOTAL',
'COMPLETE_PLUMBING_PERC',
'COMPLETE_KITCHEN_PERC',
'HOUSE_AGE_MEDIAN',
'RENTED_PERC',
'UNITS_TOTAL'
]
# The target variable will be built from this column
target_column = 'HOUSE_VALUE_MEDIAN'
# Select a subset of columns
houses = houses_full[[target_column] + feature_columns]
# Remove rows with null values
houses = houses.dropna()
# Define training and test sets
X_train, X_test, y_train, y_test, geom_train, geom_test = spatial_train_test_split(houses,
y=target_column,
test_size=0.33,
numpy_result=True,
random_state=32)
# Define constants to create a categorical variable
y = houses[target_column].values
y_mean = y.mean()
y_std = y.std()
# House prices below the mean minus 0.5 std are considered a low-value
# House prices above the mean plus 0.5 std are considered a high-value
mid_low_price = y_mean - y_std * 0.5
mid_hi_price = y_mean + y_std * 0.5
# Define the function that generates the target variable based on the house value
def classify_house_value(house_value):
if house_value < mid_low_price:
return 0.0
if house_value > mid_hi_price:
return 2.0
return 1.0
# Generate the target variable for the training and test sets
y_c_train = [classify_house_value(inc) for inc in y_train]
y_c_test = [classify_house_value(inc) for inc in y_test]
# Define the spatial weights
weights_definition = DistanceBandWeightsDefinition(threshold=2388.51)
# Create an instance of GeographicalClassifier
grfc_model = GeographicalClassifier(model_cls=RandomForestClassifier,
n_estimators=10,
local_weight=0.80,
spatial_weights_definition=weights_definition,
random_state=32)
# Train the model
grfc_model.fit(X_train, y=y_c_train, geometries=geom_train, n_jobs=-1)
# Print the predictions with the test set
grfc_predictions_test = grfc_model.predict(X_test, geometries=geom_test).flatten()
print(f"\n>> predictions (X_test):\n {grfc_predictions_test[:10]}")
# Print the score with the test set
grfc_accuracy = grfc_model.score(X_test, y_c_test, geometries=geom_test)
print(f"\n>> accuracy (X_test):\n {grfc_accuracy}")
The output consists of the predictions of the first 10 observations of the test set and the model's accuracy using the same test set.
>> predictions (X_test):
[1 1 0 2 2 1 1 0 0 0]
>> accuracy (X_test):
0.7343004295345901