Centrality Algorithms

16.6.3 Centrality Algorithms

Centrality algorithms are used to assess the importance or influence of nodes within a graph.

These algorithms can be broadly classified into three categories based on their locality and focus.

Table 16-12 Classification of Centrality Algorithms

Category	Description	Example Algorithms
Local Measures	These algorithms evaluate a node's importance based on its immediate connections, offering straightforward and computationally efficient insights.	Degree Centrality
Global Path-Based Measures	These algorithms evaluate nodes based on their overall connectivity and shortest paths within the entire graph.	Closeness Centrality, Harmonic Centrality, Betweenness Centrality
Global Influence-Based Measures	These algorithms evaluate a node's influence based on their direct and indirect connections, offering in-depth insights into hierarchical importance and influence within complex graphs.	Eigenvector centrality, Random Walk with Restart (RWR), PageRank, ArticleRank, Hyperlink-Induced Topic Search (HITS), Stochastic Approach for Link-Structure Analysis (SALSA)

Learn more about the Centrality algorithms in the following topics.

Degree Centrality
The Degree Centrality algorithm measures the number of direct connections each node has in a graph, indicating its immediate level of influence or prominence within the graph.
Closeness Centrality
The Closeness Centrality algorithm identifies nodes that can quickly reach all other nodes, highlighting efficient communicators or spreaders of information.
Harmonic Centrality
The Harmonic Centrality algorithm improves closeness centrality to better account for disconnected graphs.
Vertex Betweenness Centrality
The Vertex Betweenness Centrality algorithm identifies nodes that act as critical bridges, controlling the flow of information or resources through the graph.
PageRank
PageRank assigns a numerical weight to each vertex, measuring its relative importance within the graph.

Parent topic: Executing Built-in Algorithms

16.6.3.1 Degree Centrality

The Degree Centrality algorithm measures the number of direct connections each node has in a graph, indicating its immediate level of influence or prominence within the graph.

This algorithm can be applied in the following scenarios:

To quickly gain local insights into node importance based on direct connections.
To understand immediate influence or activity levels.
Social networking analysis, such as identifying users with most friends or followers.
Financial transactions analysis, such as detecting accounts with the highest number of transactions.
Studying spreading of diseases by pinpointing individuals who have direct contact with most people.

The following variants of Degree Centrality algorithms are supported:

Degree Centrality: Returns the sum of the number of incoming and outgoing edges for each vertex in the graph.
In-Degree Centrality: Returns the sum of the number of incoming edges for each vertex in the graph.
Out-Degree Centrality: Returns the sum of the number of outgoing edges for each vertex in the graph.

See the Javadoc and Python API Reference for more information on the corresponding APIs for running these algorithms.

Example 16-2 Running the In-Degree Centrality Algorithm

The following example runs the in-degree centrality algorithm on BANK_GRAPH to identify the top 10 accounts having the maximum number of incoming transactions.

opg4j> var graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL)
graph ==> PgxGraph[name=BANK_GRAPH,N=1000,E=4996,created=1643308582055]
opg4j> var a = session.createAnalyst()
a ==> NamedArgumentAnalyst[session=4c054326-600d-47d3-ab40-36b41fa0e339]
opg4j> a.inDegreeCentrality(graph)
$3 ==> VertexProperty[name=in_degree,type=integer,graph=BANK_GRAPH_PGQL]
opg4j> graph.queryPgql("SELECT DISTINCT m.id, m.in_degree FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.in_degree DESC LIMIT 10").print()
+-----------------+
| id  | in_degree |
+-----------------+
| 387 | 39        |
| 934 | 39        |
| 135 | 36        |
| 534 | 32        |
| 380 | 31        |
| 330 | 30        |
| 406 | 28        |
| 746 | 28        |
| 259 | 26        |
| 352 | 26        |
+-----------------+
$5 ==> PgqlResultSetImpl[graph=BANK_GRAPH_PGQL,numResults=10]

PgxGraph graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL);
Analyst a = session.createAnalyst();
a.inDegreeCentrality(graph);
PgqlResultSet rs = graph.queryPgql("SELECT DISTINCT m.id, m.in_degree FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.in_degree DESC LIMIT 10");
rs.print();

>>> graph = session.read_graph_by_name('BANK_GRAPH', 'pg_pgql')
>>> a = session.create_analyst()
>>> a.in_degree_centrality(graph)
VertexProperty(name: in_degree, type: integer, graph: BANK_GRAPH_PGQL)
>>> graph.query_pgql("SELECT DISTINCT m.id, m.in_degree FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.in_degree DESC LIMIT 10").print()
+-----------------+
| id  | in_degree |
+-----------------+
| 387 | 39        |
| 934 | 39        |
| 135 | 36        |
| 534 | 32        |
| 380 | 31        |
| 330 | 30        |
| 406 | 28        |
| 746 | 28        |
| 259 | 26        |
| 352 | 26        |
+-----------------+

Parent topic: Centrality Algorithms

16.6.3.2 Closeness Centrality

The Closeness Centrality algorithm identifies nodes that can quickly reach all other nodes, highlighting efficient communicators or spreaders of information.

This algorithm can be applied in the following scenarios:

To determine nodes that are central to the overall structure of the graph, facilitating efficient information flow or influence spreading.
In social network analysis, for example, finding influencers who can disseminate information quickly to the entire network.
In financial transactions analysis, for example, accounts with high closeness centrality might facilitate rapid money movement, making them potential candidates for closer monitoring for fraudulent activities such as money laundering.
In studying spreading of diseases, for example, identifying individuals who can potentially spread a disease to many others quickly due to their central position in the network.

The following two variants are supported for Closeness Centrality:

Closeness Centrality (Unit Length): The Closeness Centrality of a node v is the reciprocal of the sum of all the distances from the possible shortest paths starting from v. Thus the higher the centrality value of v, the closer it is to all the other vertices in the graph.
Closeness Centrality (with weights): This takes into account the weights from the edges when computing the reciprocal of the sum of all the distances from the possible shortest paths starting from the vertex v, for every vertex in the graph. The weights of the edges must be positive values greater than 0.

See the Javadoc and Python API Reference for more information on the corresponding APIs for running these algorithms.

Example 16-3 Running the Closeness Centrality Algorithm

The following example runs the Closeness Centrality algorithm on BANK_GRAPH to identify the top five accounts that have higher levels of connections with the other accounts.

opg4j> var graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL)
graph ==> PgxGraph[name=BANK_GRAPH,N=1000,E=4996,created=1643308582055]
opg4j> var a = session.createAnalyst()
a ==> NamedArgumentAnalyst[session=4c054326-600d-47d3-ab40-36b41fa0e339]
opg4j> a.closenessCentralityUnitLength(graph)
$6 ==> VertexProperty[name=closeness,type=double,graph=BANK_GRAPH_PGQL]
opg4j> graph.queryPgql("SELECT DISTINCT m.id, m.closeness FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.closeness DESC LIMIT 5").print()
+-----------------------------+
| id  | closeness             |
+-----------------------------+
| 934 | 3.866976024748647E-4  |
| 135 | 3.8595137012736397E-4 |
| 387 | 3.8476337052712584E-4 |
| 406 | 3.8284839203675346E-4 |
| 330 | 3.7425149700598805E-4 |
+-----------------------------+
$7 ==> PgqlResultSetImpl[graph=BANK_GRAPH_PGQL,numResults=5]

PgxGraph graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL);
Analyst a = session.createAnalyst();
a.closenessCentralityUnitLength(graph);
PgqlResultSet rs = graph.queryPgql("SELECT DISTINCT m.id, m.closeness FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.closeness DESC LIMIT 5");
rs.print();

>>> graph = session.read_graph_by_name('BANK_GRAPH', 'pg_pgql')
>>> a = session.create_analyst()
>>> a.closeness_centrality(graph)
VertexProperty(name: closeness, type: double, graph: BANK_GRAPH_PGQL)
>>> graph.query_pgql("SELECT DISTINCT m.id, m.closeness FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.closeness DESC LIMIT 5").print()
+-----------------------------+
| id  | closeness             |
+-----------------------------+
| 934 | 3.866976024748647E-4  |
| 135 | 3.8595137012736397E-4 |
| 387 | 3.8476337052712584E-4 |
| 406 | 3.8284839203675346E-4 |
| 330 | 3.7425149700598805E-4 |
+-----------------------------+

Parent topic: Centrality Algorithms

16.6.3.3 Harmonic Centrality

The Harmonic Centrality algorithm improves closeness centrality to better account for disconnected graphs.

This algorithm can be applied in the following scenarios for a disconnected graph:

To determine nodes that are central to the overall structure of the graph, facilitating efficient information flow or influence spreading.
In social network analysis, for example, finding influencers who can disseminate information quickly to the entire network.
In financial transactions analysis, for example, accounts with high closeness centrality might facilitate rapid money movement, making them potential candidates for closer monitoring for fraudulent activities such as money laundering.
In studying spreading of diseases, for example, identifying individuals who can potentially spread a disease to many others quickly due to their central position in the network.

See the Javadoc and Python API Reference for more information on the corresponding APIs for running these algorithms.

Example 16-4 Running the Harmonic Centrality Algorithm

The following example measures the harmonic centrality value for each vertex account in BANK_GRAPH and prints the top five accounts that have higher levels of connections with the other accounts.

opg4j> var graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL)
graph ==> PgxGraph[name=BANK_GRAPH,N=1000,E=4996,created=1643308582055]
opg4j> var a = session.createAnalyst()
a ==> NamedArgumentAnalyst[session=4c054326-600d-47d3-ab40-36b41fa0e339]
opg4j> a.harmonicCentrality(graph)
VertexProperty[name=hc,type=double,graph=BANK_GRAPH_PGQL]
opg4j> graph.queryPgql("SELECT DISTINCT m.id, m.hc FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.hc DESC LIMIT 5").print()
+--------------------------+
| id  | hc                 |
+--------------------------+
| 34  | 193.53134920634574 |
| 770 | 193.5238095238061  |
| 778 | 193.41904761904416 |
| 262 | 193.32936507936165 |
| 243 | 192.78293650793313 |
+--------------------------+
$9 ==> PgqlResultSetImpl[graph=BANK_GRAPH_PGQL,numResults=5]

PgxGraph graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL);
Analyst a = session.createAnalyst();
a.harmonicCentrality(graph);
PgqlResultSet rs = g1.queryPgql("SELECT DISTINCT m.id, m.hc FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.hc DESC LIMIT 5");
rs.print();

>>> graph = session.read_graph_by_name('BANK_GRAPH', 'pg_pgql')
>>> a = session.create_analyst()
>>> a.harmonic_centrality(graph)
VertexProperty(name: harmonic_centrality, type: double, graph: BANK_GRAPH_PGQL)
>>> graph.query_pgql("SELECT DISTINCT m.id, m.harmonic_centrality FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.harmonic_centrality DESC LIMIT 5").print()
+---------------------------+
| id  | harmonic_centrality |
+---------------------------+
| 34  | 193.3884920634886   |
| 262 | 193.32936507936165  |
| 56  | 193.10158730158386  |
| 544 | 192.87738095237754  |
| 408 | 192.73452380952043  |
+---------------------------+

Parent topic: Centrality Algorithms

16.6.3.4 Vertex Betweenness Centrality

The Vertex Betweenness Centrality algorithm identifies nodes that act as critical bridges, controlling the flow of information or resources through the graph.

This algorithm can be applied in the following scenarios:

To identifying nodes that are strategically positioned for mediating interactions within the graph.
In social network analysis, for example, highlighting users who act as connectors between different groups or communities.
In financial transactions analysis, for example, detecting accounts that facilitate transactions between otherwise unconnected parts of the financial network.
In studying spreading of diseases, for example, identifying individuals who serve as bridges between different social groups, potentially controlling the spread of a disease across the entire population.

The following three variants are supported for Vertex Betweenness Centrality:

Vertex Betweenness Centrality: The Betweenness Centrality of a vertex v in a graph is the sum of the fraction of shortest paths that pass through v from all the possible shortest paths connecting every possible pair of vertices s, t in the graph, such that v is different from s and t. This algorithm applies for connected graphs.
Approximate Vertex Betweenness Centrality with Random Seeds: This variant of betweenness centrality approximates the centrality of the vertices by just using k random vertices as starting points for the BFS traversals of the graph, instead of computing the exact value by using all the vertices in the graph.
Approximate Vertex Betweenness Centrality From seeds: This variant of betweenness centrality approximates the centrality of the vertices by just using the vertices from the given sequence as starting points for the BFS traversals of the graph, instead of computing the exact value by using all the vertices in the graph.

See the Javadoc and Python API Reference for more information on the corresponding APIs for running these algorithms.

Example 16-5 Running the Betweenness Centrality Algorithm

The following example identifies the top five accounts that act as critical bridges in graph g1.

opg4j> var graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL)
graph ==> PgxGraph[name=BANK_GRAPH,N=1000,E=4996,created=1643308582055]
opg4j> var a = session.createAnalyst()
a ==> NamedArgumentAnalyst[session=4c054326-600d-47d3-ab40-36b41fa0e339]
opg4j> a.vertexBetweennessCentrality(graph)
$10 ==> VertexProperty[name=betweenness,type=double,graph=BANK_GRAPH_PGQL]
opg4j> graph.queryPgql("SELECT DISTINCT m.id, m.betweenness FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.betweenness DESC LIMIT 5").print()
+--------------------------+
| id  | betweenness        |
+--------------------------+
| 387 | 18913.34886094081  |
| 352 | 16625.818593102595 |
| 135 | 15190.461087012543 |
| 934 | 14642.317059371073 |
| 222 | 13688.935057639192 |
+--------------------------+
$11 ==> PgqlResultSetImpl[graph=BANK_GRAPH_PGQL,numResults=5]

PgxGraph graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL);
Analyst a = session.createAnalyst();
a.vertexBetweennessCentrality(graph);
PgqlResultSet rs = g1.queryPgql("SELECT DISTINCT m.id, m.betweenness FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.betweenness DESC LIMIT 5");
rs.print();

>>> graph = session.read_graph_by_name('BANK_GRAPH', 'pg_pgql')
>>> a = session.create_analyst()
>>> a.vertex_betweenness_centrality(graph)
VertexProperty(name: betweenness, type: double, graph: BANK_GRAPH_PGQL)
>>> graph.query_pgql("SELECT DISTINCT m.id, m.betweenness FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.betweenness DESC LIMIT 5").print()
+--------------------------+
| id  | betweenness        |
+--------------------------+
| 387 | 18913.34886094081  |
| 352 | 16625.818593102595 |
| 135 | 15190.461087012543 |
| 934 | 14642.317059371073 |
| 222 | 13688.935057639192 |
+--------------------------+

Parent topic: Centrality Algorithms

16.6.3.5 PageRank

PageRank assigns a numerical weight to each vertex, measuring its relative importance within the graph.

This algorithm can be applied in the following scenarios:

To identify nodes that directly and indirectly influence other nodes.
In place of Eigenvector centrality, when the underlying process includes randomness, for example, random browsing.
In social networks analysis, for example, finding influencers with direct and indirect influence within the graph.
In disease spreading analysis, for example, assessing the risk of individuals in spreading a disease.

PageRank computes a rank value between 0 and 1 for each vertex (node) in the graph and stores the values in a double property. The algorithm therefore creates a vertex property of type double for the output.

In the graph server (PGX), there are two types of vertex and edge properties:

Persistent Properties: Properties that are loaded with the graph from a data source are fixed, in-memory copies of the data on disk, and are therefore persistent. Persistent properties are read-only, immutable and shared between sessions.
Transient Properties: Values can only be written to transient properties, which are private to a session. You can create transient properties by calling createVertexProperty and createEdgeProperty on PgxGraph objects, or by copying existing properties using clone() on Property objects.

Transient properties hold the results of computation by algorithms. For example, the PageRank algorithm computes a rank value between 0 and 1 for each vertex in the graph and stores these values in a transient property named pg_rank. Transient properties are destroyed when the Analyst object is destroyed.

The following variants of PageRank algorithms are supported:

Classic PageRank: Computes ranking scores for the vertices using the network created by the incoming edges in the graph. Although it is intended for directed graphs, undirected graphs can be treated as well by converting them into directed graphs with reciprocated edges (that is, keeping the original edge and creating a second one going in the opposite direction).
Approximate PageRank: Computes the ranking scores for the vertices in similar way to the classic algorithm without normalization and with a more relaxed convergence criteria since the tolerated error value is compared against each single vertex in the graph, instead of looking at the cumulative vertex error. Thus this variant will converge faster than the classic algorithm, but the ranking values might not be as accurate as in the classic implementation.
Weighted PageRank: Similar to the classic PageRank algorithm, except that it allows for a weight value to be assigned to each edge. This weight determines the fraction of the PageRank score that will flow from the source vertex through the current edge to its destination vertex.

See the Javadoc and Python API Reference for more information on the corresponding APIs for running these algorithms.

Example 16-6 Running the PageRank Algorithm

The following example runs the PageRank algorithm on BANK_GRAPH to identify the top five accounts with the highest PageRank values. The PageRank algorithm uses the following default values for the input parameters: error (tolerance = 0.001), damping factor = 0.85, and maximum number of iterations = 100.

opg4j> var graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL)
graph ==> PgxGraph[name=BANK_GRAPH,N=1000,E=4996,created=1643308582055]
opg4j> var a = session.createAnalyst()
a ==> NamedArgumentAnalyst[session=4c054326-600d-47d3-ab40-36b41fa0e339]
opg4j> a.pagerank(graph, 0.001, 0.85, 100)
$12 ==> VertexProperty[name=pagerank,type=double,graph=BANK_GRAPH_PGQL]
opg4j> graph.queryPgql("SELECT DISTINCT m.id, m.pagerank FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.pagerank DESC LIMIT 5").print()
+-----------------------------+
| id  | pagerank              |
+-----------------------------+
| 387 | 0.0073028362522059255 |
| 406 | 0.0067344306145590786 |
| 135 | 0.006725965475577352  |
| 934 | 0.0066413407648344865 |
| 397 | 0.0057016075312134595 |
+-----------------------------+
$13 ==> PgqlResultSetImpl[graph=BANK_GRAPH_PGQL,numResults=5]

PgxGraph graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL);
Analyst a = session.createAnalyst();
a.pagerank(graph, 0.001, 0.85, 100);
PgqlResultSet rs = g1.queryPgql("SELECT DISTINCT m.id, m.pagerank FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.pagerank DESC LIMIT 5");
rs.print();

>>> graph = session.read_graph_by_name('BANK_GRAPH', 'pg_pgql')
>>> a = session.create_analyst()
>>> a.pagerank(graph, 0.001, 0.85, 100)
VertexProperty(name: pagerank, type: double, graph: BANK_GRAPH_PGQL)
>>> graph.query_pgql("SELECT DISTINCT m.id, m.pagerank FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.pagerank DESC LIMIT 5").print()
+-----------------------------+
| id  | pagerank              |
+-----------------------------+
| 387 | 0.0073028362522059255 |
| 406 | 0.0067344306145590786 |
| 135 | 0.006725965475577352  |
| 934 | 0.0066413407648344865 |
| 397 | 0.0057016075312134595 |
+-----------------------------+

Parent topic: Centrality Algorithms