clustering

This example shows how to run the agglomerative with regionalization algorithm over a given dataset, specifying the number of clusters and the type of spatial weights.

The clustering algorithm is set in the method parameter, while the number of clusters and the spatial weights are defined in the n_clusters and weights_def parameters respectively. The features considered for clustering are specified in the columns parameter.

select *
    from table( 
        pyqEval(
            '{  
                "oml_connect": true, 
                "table": "oml_user.la_block_groups",
                "columns": ["median_income"],
                "method": "AGGLOMERATIVE",
                "n_clusters": 6,
                "key_column": "geoid",
                "weights_def": {"type": "Queen"}
            }',
            '{ "geoid": "VARCHAR2(50)", "label": "NUMBER" }',
            'clustering'
        )
    );

The result contains the index column specified in the key_column parameter and the labels of each row, indicating to which cluster they belong.



You can visualize the clusters using the select IMAGE clause and the oml_graphics_flag parameter set to true. In the following code, the plot parameter indicates that it uses a basemap as background. Also, note that the output format (out_fmt) is set to PNG.

select IMAGE
    from table(
        pyqEval(
            par_lst => '{
            "oml_connect": true,
            "oml_graphics_flag": true,
            "table": "oml_user.la_block_groups",
            "columns": ["median_income"],
            "method": "AGGLOMERATIVE",
            "n_clusters": 6, 
            "key_column": "geoid",
            "weights_def": {"type": "Queen"},
            "plot": {"with_basemap": true}
        }',
        out_fmt => 'PNG',
        scr_name => 'clustering'
    )
);

The result is a map with the observations colored according to the cluster they are assigned. Note that there are six clusters as specified in the n_clusters parameter. By defining spatial weights, the agglomerative clustering algorithm executes regionalization. This means that observations assigned to the same cluster share common characteristics and are geographically connected.