17.7 Using the Pg2vec Algorithm
Pg2vec learns representations of graphlets (partitions inside a graph) by employing edges as the principal learning units and thereby packing more information in each learning unit (as compared to employing vertices as learning units) for the representation learning task.
It consists of three main steps:
- Random walks for each vertex (with pre-defined length per walk and pre-defined number of walks per vertex) are generated.
- Each edge in this random walk is mapped as a
property.edge-word
in the created document (with the document label as the graph-id) where theproperty.edge-word
is defined as the concatenation of the properties of the source and destination vertices. - The generated documents (with their attached document labels) are fed to a Doc2Vec algorithm which generates the vector representation for each document, which is a graph in this case.
Pg2vec creates graphlet embeddings for a specific set of graphlets and cannot be updated to incorporate modifications on these graphlets. Instead, a new Pg2vec model should be trained on these modified graphlets.
O(2(n+m)*d)where:
n
: is the number of vertices in the graphm
: is the number of graphlets in the graphd
: is the embedding length
The following describes a few use cases where the Pg2vec
algorithm can be applied:
- Chemical Compound Analysis: To represent chemical compounds as
graphs and use
Pg2vec
to find similarities between compounds. This helps in drug discovery and chemical research. - Document Classification: To represent documents as graphs of words or phrases (for example, using dependency parsing in NLP) and classify them into topics or genres based on their embeddings.
- Network Comparison: To compare different social or biological networks by generating embeddings for entire graphs. This can be used to study the evolution of networks over time or to compare different species’ protein interaction networks in biology.
The following describes the usage of the main functionalities of the implementation of Pg2vec in PGX using NCI109 dataset as an example with 4127 graphs in it:
- Loading a Graph
- Building a Minimal Pg2vec Model
- Building a Customized Pg2vec Model
- Training a Pg2vec Model
- Getting the Loss Value For a Pg2vec Model
- Computing Similar Graphlets for a Given Graphlet
- Computing Similars for a Graphlet Batch
- Inferring a Graphlet Vector
- Inferring Vectors for a Graphlet Batch
- Storing a Trained Pg2vec Model
- Loading a Pre-Trained Pg2vec Model
- Destroying a Pg2vec Model
Parent topic: Using the Machine Learning Library (PgxML) for Graphs