Train the Model

You can use the LISAHotspotClustering class for identifying clusters and outliers. It internally performs all the analysis done so far, calculating the Local Moran’s I for each observation in the dataset and assigns them to the corresponding quadrant.

Depending on the build parameters, some observations can be labeled with –1 (undefined). For example, in the following code, setting the max_p_value=0.05 parameter causes all observations with a p-value greater than 0.05 to be labeled with –1 in order to keep only statistically significant values.

from oraclesai.clustering import LISAHotspotClustering
 
# Create an instance of LISAHotspotClustering
lisa_model = LISAHotspotClustering(column="MEDIAN_INCOME",
                                   max_p_value=0.05,
                                   spatial_weights_definition=weights_definition)
 
# Train the model
lisa_model.fit(X)
 
# Print the labels
lisa_labels = lisa_model.labels_
print(f"labels = {lisa_labels[:10]}")

The output of the program are the labels or quadrants assigned to the first ten observation of the training set.

labels = [ 2  2  1  1  1  1  1 -1 -1 -1]

Hot spots are labeled with the number 1, while cold spots are labeled with the number 3. To identify only hot and cold spots, perform the following.

import numpy as np

hotcold_labels = np.where(lisa_labels % 2 != 0, lisa_labels, -1)

A spatial outlier is an observation with a value different from its neighbors. These are represented with the label 2 and 4. To identify the spatial outliers, run the following code.

outlier_labels = np.where(lisa_labels % 2 == 0, lisa_labels, -1)