17.4 Using the Unsupervised GraphWise Algorithm (Vertex Embeddings)
Unsupervised GraphWise
is an unsupervised inductive
vertex representation learning algorithm which is able to leverage vertex
information. The learned embeddings can be used in various downstream tasks
including vertex classification, vertex clustering and similar vertex
search.
Unsupervised GraphWise is based on Deep Graph Infomax (DGI) by Velickovic et al.
Model Structure
A Unsupervised GraphWise model consists of graph convolutional layers followed by an embedding layer which defaults to a DGI Layer.
The forward pass through a convolutional layer for a vertex proceeds as follows:
- A set of neighbors of the vertex is sampled.
- The previous layer representations of the neighbors are mean-aggregated, and the aggregated features are concatenated with the previous layer representation of the vertex.
- This concatenated vector is multiplied with weights, and a bias vector is added.
- The result is normalized to such that the layer output has unit norm.
The DGI Layer consists of three parts enabling unsupervised learning using embeddings produced by the convolution layers.
- Corruption function: Shuffles the node features while preserving the graph structure to produce negative embedding samples using the convolution layers.
- Readout function: Sigmoid activated mean of embeddings, used as summary of a graph.
- Discriminator: Measures the similarity of positive (unshuffled) embeddings with the summary as well as the similarity of negative samples with the summary from which the loss function is computed.
Since none of these contains mutable hyperparameters, the default DGI layer is always used and cannot be adjusted.
The second embedding layer available is the Dominant Layer.
Dominant is a model that detects anomalies based on the features and the neighbors' structure. Using GCNs to reconstruct the features in an autoencoder's settings, and the mask with the dot products of the embeddings.
The loss function is computed from the feature reconstruction loss and the structure reconstruction loss. The importance given to features or to the structure can be tuned with the alpha hyperparameter.
The following describes a few use cases where
UnsupervisedGraphWise
algorithm can be
applied:
- Fraud Detection in Financial Transactions: To identify clusters of fraudulent activities by analyzing the transaction network and generating embeddings for accounts or transactions. This helps in detecting unknown patterns of fraud.
- Network Optimization: To optimize network performance in telecommunications by clustering the network nodes (such as routers or cell towers) based on traffic patterns. This helps to improve data flow and reduce latency.
- Bioinformatics: To analyze protein-protein interaction networks and discover new clusters or communities of proteins that might share similar functions. This helps in drug discovery and understanding biological processes.
The following describes the usage of the main functionalities of the implementation
of DGI
in PGX using the Cora graph as an
example.
- Loading a Graph
- Building a Minimal Unsupervised GraphWise Model
- Advanced Hyperparameter Customization
- Supported Property Types for Unsupervised GraphWise Model
- Building an Unsupervised GraphWise Model Using Partitioned Graphs
- Training an Unsupervised GraphWise Model
- Getting the Loss Value for an Unsupervised GraphWise Model
- Getting the Training Log for an Unsupervised GraphWise Model
- Inferring Embeddings for an Unsupervised GraphWise Model
- Classifying the Vertices Using the Obtained Embeddings
- Storing an Unsupervised GraphWise Model
- Loading a Pre-Trained Unsupervised GraphWise Model
- Destroying an Unsupervised GraphWise Model
- Explaining a Prediction for an Unsupervised GraphWise Model
Parent topic: Using the Machine Learning Library (PgxML) for Graphs