Explore the Data

Exploring the data helps you to understand the variables individually and how they interact.

Perform the following steps to explore the data:
  1. Understand the data by visualizing the first observations of the training set using the head method. The following example uses the median_income column .
    from oraclesai import enable_geodataframes 
    enable_geodataframes(z)
    
    X = block_groups["MEDIAN_INCOME"] 
    z.show(X.head())

    The output is as shown:



  2. Define spatial weights to understand the behavior of each variable in neighboring locations (by establishing the relationship between the neighboring locations).

    Use the K-Nearest Neighbor approach, which indicates that for each observation the nearest K observations are considered neighbors.

    from oraclesai.weights import KNNWeightsDefinition
    
    # Define spatial weights 
    weights_definition = KNNWeightsDefinition(k=10)
  3. Calculate the global spatial autocorrelation.

    The Moran’s I statistic is a measure of spatial autocorrelation. It computes the global spatial autocorrelation if applied on the whole dataset.

    • A positive and significant value indicates the presence of spatial clustering, where regions with similar values tend to be together, reflecting the effect of spatial dependence.
    • A negative and significant value indicates the presence of spatial variance or the checkerboard pattern, reflecting the effect of spatial heterogeneity.

    The following code calculates the spatial lag for all the variables in the training set, except the geometries.

    from oraclesai.analysis import MoranITest 
    from oraclesai.weights import SpatialWeights 
    
    # Create spatial weights from definition 
    spatial_weights = SpatialWeights.create(X["geometry"].values, weights_definition) 
    
    # Run the Moran's I test 
    moran_test = MoranITest.create(X, spatial_weights, column_name="MEDIAN_INCOME") 
    
    # Print the Moran's I and the p-value 
    print("Moran's I = ", moran_test.i) 
    print("p_value = ", moran_test.p_value)

    The output of the program is as shown:

    Moran's I =  0.6086540661785302
    p_value =  0.001

    A positive and significant value indicates the presence of Spatial Dependence, represented by clusters of observations with similar income. However, it does not indicate the location of such clusters.

  4. Calculate the local spatial autocorrelation.

    Use the Local Indicators of Spatial Association (LISA) method to find the clusters. The algorithm calculates the Local Moran’s I statistic for each observation.

    • A location with a positive local Moran’s I statistic indicates the presence of neighbors with similar values (either high or low values), representing hot or cold spots.
    • A location with a negative local Moran’s value indicates neighbor locations with different values; it can be a high value surrounded by low values or a low value surrounded by high values, representing spatial outliers.

    The LocalMoranITest class calculates each observation's local spatial autocorrelation index based on a specific feature and spatial weights. The following code prints the local autocorrelation index and p-values of the first ten observations in the dataset.

    from oraclesai.analysis import LocalMoranITest 
    
    # Run the Local Moran's I test 
    local_moran_test = LocalMoranITest.create(X, spatial_weights, column_name="MEDIAN_INCOME") 
    
    # Print the Local Moran's I and  p-values 
    print("Local Moran's I = ", local_moran_test.i_list[:10]) 
    print("p_values = ", local_moran_test.p_values[:10])

    The output of the code is as shown:

    Local Moran's I =  [-0.28929661 -0.24813967  0.53874783  2.50789083  2.59829807  0.96529687
      0.62729663  0.79068262 -0.00862826 -0.11777731]
    p_values =  [0.025 0.003 0.001 0.001 0.001 0.019 0.015 0.088 0.336 0.119]