16.6.3 Centrality Algorithms
Centrality algorithms are used to assess the importance or influence of nodes within a graph.
These algorithms can be broadly classified into three categories based on their locality and focus.
Table 16-12 Classification of Centrality Algorithms
Category | Description | Example Algorithms |
---|---|---|
Local Measures | These algorithms evaluate a node's importance based on its immediate connections, offering straightforward and computationally efficient insights. | Degree Centrality |
Global Path-Based Measures | These algorithms evaluate nodes based on their overall connectivity and shortest paths within the entire graph. | Closeness Centrality, Harmonic Centrality, Betweenness Centrality |
Global Influence-Based Measures | These algorithms evaluate a node's influence based on their direct and indirect connections, offering in-depth insights into hierarchical importance and influence within complex graphs. | Eigenvector centrality, Random Walk with Restart (RWR), PageRank, ArticleRank, Hyperlink-Induced Topic Search (HITS), Stochastic Approach for Link-Structure Analysis (SALSA) |
Learn more about the Centrality algorithms in the following topics.
- Degree Centrality
The Degree Centrality algorithm measures the number of direct connections each node has in a graph, indicating its immediate level of influence or prominence within the graph. - Closeness Centrality
The Closeness Centrality algorithm identifies nodes that can quickly reach all other nodes, highlighting efficient communicators or spreaders of information. - Harmonic Centrality
The Harmonic Centrality algorithm improves closeness centrality to better account for disconnected graphs. - Vertex Betweenness Centrality
The Vertex Betweenness Centrality algorithm identifies nodes that act as critical bridges, controlling the flow of information or resources through the graph. - PageRank
PageRank assigns a numerical weight to each vertex, measuring its relative importance within the graph.
Parent topic: Executing Built-in Algorithms
16.6.3.1 Degree Centrality
The Degree Centrality algorithm measures the number of direct connections each node has in a graph, indicating its immediate level of influence or prominence within the graph.
This algorithm can be applied in the following scenarios:
- To quickly gain local insights into node importance based on direct connections.
- To understand immediate influence or activity levels.
- Social networking analysis, such as identifying users with most friends or followers.
- Financial transactions analysis, such as detecting accounts with the highest number of transactions.
- Studying spreading of diseases by pinpointing individuals who have direct contact with most people.
The following variants of Degree Centrality algorithms are supported:
- Degree Centrality: Returns the sum of the number of incoming and outgoing edges for each vertex in the graph.
- In-Degree Centrality: Returns the sum of the number of incoming edges for each vertex in the graph.
- Out-Degree Centrality: Returns the sum of the number of outgoing edges for each vertex in the graph.
Example 16-2 Running the In-Degree Centrality Algorithm
The following example runs the in-degree centrality algorithm on
BANK_GRAPH
to identify the top 10 accounts having the maximum
number of incoming transactions.
opg4j> var graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL)
graph ==> PgxGraph[name=BANK_GRAPH,N=1000,E=4996,created=1643308582055]
opg4j> var a = session.createAnalyst()
a ==> NamedArgumentAnalyst[session=4c054326-600d-47d3-ab40-36b41fa0e339]
opg4j> a.inDegreeCentrality(graph)
$3 ==> VertexProperty[name=in_degree,type=integer,graph=BANK_GRAPH_PGQL]
opg4j> graph.queryPgql("SELECT DISTINCT m.id, m.in_degree FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.in_degree DESC LIMIT 10").print()
+-----------------+
| id | in_degree |
+-----------------+
| 387 | 39 |
| 934 | 39 |
| 135 | 36 |
| 534 | 32 |
| 380 | 31 |
| 330 | 30 |
| 406 | 28 |
| 746 | 28 |
| 259 | 26 |
| 352 | 26 |
+-----------------+
$5 ==> PgqlResultSetImpl[graph=BANK_GRAPH_PGQL,numResults=10]
PgxGraph graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL);
Analyst a = session.createAnalyst();
a.inDegreeCentrality(graph);
PgqlResultSet rs = graph.queryPgql("SELECT DISTINCT m.id, m.in_degree FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.in_degree DESC LIMIT 10");
rs.print();
>>> graph = session.read_graph_by_name('BANK_GRAPH', 'pg_pgql')
>>> a = session.create_analyst()
>>> a.in_degree_centrality(graph)
VertexProperty(name: in_degree, type: integer, graph: BANK_GRAPH_PGQL)
>>> graph.query_pgql("SELECT DISTINCT m.id, m.in_degree FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.in_degree DESC LIMIT 10").print()
+-----------------+
| id | in_degree |
+-----------------+
| 387 | 39 |
| 934 | 39 |
| 135 | 36 |
| 534 | 32 |
| 380 | 31 |
| 330 | 30 |
| 406 | 28 |
| 746 | 28 |
| 259 | 26 |
| 352 | 26 |
+-----------------+
Parent topic: Centrality Algorithms
16.6.3.2 Closeness Centrality
The Closeness Centrality algorithm identifies nodes that can quickly reach all other nodes, highlighting efficient communicators or spreaders of information.
This algorithm can be applied in the following scenarios:
- To determine nodes that are central to the overall structure of the graph, facilitating efficient information flow or influence spreading.
- In social network analysis, for example, finding influencers who can disseminate information quickly to the entire network.
- In financial transactions analysis, for example, accounts with high closeness centrality might facilitate rapid money movement, making them potential candidates for closer monitoring for fraudulent activities such as money laundering.
- In studying spreading of diseases, for example, identifying individuals who can potentially spread a disease to many others quickly due to their central position in the network.
The following two variants are supported for Closeness Centrality:
- Closeness Centrality (Unit Length): The Closeness Centrality of a
node
v
is the reciprocal of the sum of all the distances from the possible shortest paths starting fromv
. Thus the higher the centrality value ofv
, the closer it is to all the other vertices in the graph. - Closeness Centrality (with weights): This takes into account the
weights from the edges when computing the reciprocal of the sum of all the distances
from the possible shortest paths starting from the vertex
v
, for every vertex in the graph. The weights of the edges must be positive values greater than 0.
Example 16-3 Running the Closeness Centrality Algorithm
The following example runs the Closeness Centrality algorithm on
BANK_GRAPH
to identify the top five accounts that have higher
levels of connections with the other accounts.
opg4j> var graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL)
graph ==> PgxGraph[name=BANK_GRAPH,N=1000,E=4996,created=1643308582055]
opg4j> var a = session.createAnalyst()
a ==> NamedArgumentAnalyst[session=4c054326-600d-47d3-ab40-36b41fa0e339]
opg4j> a.closenessCentralityUnitLength(graph)
$6 ==> VertexProperty[name=closeness,type=double,graph=BANK_GRAPH_PGQL]
opg4j> graph.queryPgql("SELECT DISTINCT m.id, m.closeness FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.closeness DESC LIMIT 5").print()
+-----------------------------+
| id | closeness |
+-----------------------------+
| 934 | 3.866976024748647E-4 |
| 135 | 3.8595137012736397E-4 |
| 387 | 3.8476337052712584E-4 |
| 406 | 3.8284839203675346E-4 |
| 330 | 3.7425149700598805E-4 |
+-----------------------------+
$7 ==> PgqlResultSetImpl[graph=BANK_GRAPH_PGQL,numResults=5]
PgxGraph graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL);
Analyst a = session.createAnalyst();
a.closenessCentralityUnitLength(graph);
PgqlResultSet rs = graph.queryPgql("SELECT DISTINCT m.id, m.closeness FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.closeness DESC LIMIT 5");
rs.print();
>>> graph = session.read_graph_by_name('BANK_GRAPH', 'pg_pgql')
>>> a = session.create_analyst()
>>> a.closeness_centrality(graph)
VertexProperty(name: closeness, type: double, graph: BANK_GRAPH_PGQL)
>>> graph.query_pgql("SELECT DISTINCT m.id, m.closeness FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.closeness DESC LIMIT 5").print()
+-----------------------------+
| id | closeness |
+-----------------------------+
| 934 | 3.866976024748647E-4 |
| 135 | 3.8595137012736397E-4 |
| 387 | 3.8476337052712584E-4 |
| 406 | 3.8284839203675346E-4 |
| 330 | 3.7425149700598805E-4 |
+-----------------------------+
Parent topic: Centrality Algorithms
16.6.3.3 Harmonic Centrality
The Harmonic Centrality algorithm improves closeness centrality to better account for disconnected graphs.
This algorithm can be applied in the following scenarios for a disconnected graph:
- To determine nodes that are central to the overall structure of the graph, facilitating efficient information flow or influence spreading.
- In social network analysis, for example, finding influencers who can disseminate information quickly to the entire network.
- In financial transactions analysis, for example, accounts with high closeness centrality might facilitate rapid money movement, making them potential candidates for closer monitoring for fraudulent activities such as money laundering.
- In studying spreading of diseases, for example, identifying individuals who can potentially spread a disease to many others quickly due to their central position in the network.
Example 16-4 Running the Harmonic Centrality Algorithm
The following example measures the harmonic centrality value for each
vertex account in BANK_GRAPH
and prints the top five accounts that
have higher levels of connections with the other accounts.
opg4j> var graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL)
graph ==> PgxGraph[name=BANK_GRAPH,N=1000,E=4996,created=1643308582055]
opg4j> var a = session.createAnalyst()
a ==> NamedArgumentAnalyst[session=4c054326-600d-47d3-ab40-36b41fa0e339]
opg4j> a.harmonicCentrality(graph)
VertexProperty[name=hc,type=double,graph=BANK_GRAPH_PGQL]
opg4j> graph.queryPgql("SELECT DISTINCT m.id, m.hc FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.hc DESC LIMIT 5").print()
+--------------------------+
| id | hc |
+--------------------------+
| 34 | 193.53134920634574 |
| 770 | 193.5238095238061 |
| 778 | 193.41904761904416 |
| 262 | 193.32936507936165 |
| 243 | 192.78293650793313 |
+--------------------------+
$9 ==> PgqlResultSetImpl[graph=BANK_GRAPH_PGQL,numResults=5]
PgxGraph graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL);
Analyst a = session.createAnalyst();
a.harmonicCentrality(graph);
PgqlResultSet rs = g1.queryPgql("SELECT DISTINCT m.id, m.hc FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.hc DESC LIMIT 5");
rs.print();
>>> graph = session.read_graph_by_name('BANK_GRAPH', 'pg_pgql')
>>> a = session.create_analyst()
>>> a.harmonic_centrality(graph)
VertexProperty(name: harmonic_centrality, type: double, graph: BANK_GRAPH_PGQL)
>>> graph.query_pgql("SELECT DISTINCT m.id, m.harmonic_centrality FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.harmonic_centrality DESC LIMIT 5").print()
+---------------------------+
| id | harmonic_centrality |
+---------------------------+
| 34 | 193.3884920634886 |
| 262 | 193.32936507936165 |
| 56 | 193.10158730158386 |
| 544 | 192.87738095237754 |
| 408 | 192.73452380952043 |
+---------------------------+
Parent topic: Centrality Algorithms
16.6.3.4 Vertex Betweenness Centrality
The Vertex Betweenness Centrality algorithm identifies nodes that act as critical bridges, controlling the flow of information or resources through the graph.
This algorithm can be applied in the following scenarios:
- To identifying nodes that are strategically positioned for mediating interactions within the graph.
- In social network analysis, for example, highlighting users who act as connectors between different groups or communities.
- In financial transactions analysis, for example, detecting accounts that facilitate transactions between otherwise unconnected parts of the financial network.
- In studying spreading of diseases, for example, identifying individuals who serve as bridges between different social groups, potentially controlling the spread of a disease across the entire population.
The following three variants are supported for Vertex Betweenness Centrality:
- Vertex Betweenness Centrality: The Betweenness Centrality of a
vertex
v
in a graph is the sum of the fraction of shortest paths that pass throughv
from all the possible shortest paths connecting every possible pair of verticess
,t
in the graph, such thatv
is different froms
andt
. This algorithm applies for connected graphs. - Approximate Vertex Betweenness Centrality with Random Seeds: This variant of betweenness centrality approximates the centrality of the vertices by just using k random vertices as starting points for the BFS traversals of the graph, instead of computing the exact value by using all the vertices in the graph.
- Approximate Vertex Betweenness Centrality From seeds: This variant of betweenness centrality approximates the centrality of the vertices by just using the vertices from the given sequence as starting points for the BFS traversals of the graph, instead of computing the exact value by using all the vertices in the graph.
Example 16-5 Running the Betweenness Centrality Algorithm
The following example identifies the top five accounts that act as
critical bridges in graph g1
.
opg4j> var graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL)
graph ==> PgxGraph[name=BANK_GRAPH,N=1000,E=4996,created=1643308582055]
opg4j> var a = session.createAnalyst()
a ==> NamedArgumentAnalyst[session=4c054326-600d-47d3-ab40-36b41fa0e339]
opg4j> a.vertexBetweennessCentrality(graph)
$10 ==> VertexProperty[name=betweenness,type=double,graph=BANK_GRAPH_PGQL]
opg4j> graph.queryPgql("SELECT DISTINCT m.id, m.betweenness FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.betweenness DESC LIMIT 5").print()
+--------------------------+
| id | betweenness |
+--------------------------+
| 387 | 18913.34886094081 |
| 352 | 16625.818593102595 |
| 135 | 15190.461087012543 |
| 934 | 14642.317059371073 |
| 222 | 13688.935057639192 |
+--------------------------+
$11 ==> PgqlResultSetImpl[graph=BANK_GRAPH_PGQL,numResults=5]
PgxGraph graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL);
Analyst a = session.createAnalyst();
a.vertexBetweennessCentrality(graph);
PgqlResultSet rs = g1.queryPgql("SELECT DISTINCT m.id, m.betweenness FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.betweenness DESC LIMIT 5");
rs.print();
>>> graph = session.read_graph_by_name('BANK_GRAPH', 'pg_pgql')
>>> a = session.create_analyst()
>>> a.vertex_betweenness_centrality(graph)
VertexProperty(name: betweenness, type: double, graph: BANK_GRAPH_PGQL)
>>> graph.query_pgql("SELECT DISTINCT m.id, m.betweenness FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.betweenness DESC LIMIT 5").print()
+--------------------------+
| id | betweenness |
+--------------------------+
| 387 | 18913.34886094081 |
| 352 | 16625.818593102595 |
| 135 | 15190.461087012543 |
| 934 | 14642.317059371073 |
| 222 | 13688.935057639192 |
+--------------------------+
Parent topic: Centrality Algorithms
16.6.3.5 PageRank
PageRank assigns a numerical weight to each vertex, measuring its relative importance within the graph.
This algorithm can be applied in the following scenarios:
- To identify nodes that directly and indirectly influence other nodes.
- In place of Eigenvector centrality, when the underlying process includes randomness, for example, random browsing.
- In social networks analysis, for example, finding influencers with direct and indirect influence within the graph.
- In disease spreading analysis, for example, assessing the risk of individuals in spreading a disease.
PageRank computes a rank value between 0
and 1
for each vertex (node) in the graph and stores the values in a double
property. The algorithm therefore creates a vertex property of type double
for the output.
In the graph server (PGX), there are two types of vertex and edge properties:
-
Persistent Properties: Properties that are loaded with the graph from a data source are fixed, in-memory copies of the data on disk, and are therefore persistent. Persistent properties are read-only, immutable and shared between sessions.
-
Transient Properties: Values can only be written to transient properties, which are private to a session. You can create transient properties by calling
createVertexProperty
andcreateEdgeProperty
onPgxGraph
objects, or by copying existing properties usingclone()
on Property objects.Transient properties hold the results of computation by algorithms. For example, the PageRank algorithm computes a rank value between 0 and 1 for each vertex in the graph and stores these values in a transient property named
pg_rank
. Transient properties are destroyed when the Analyst object is destroyed.
The following variants of PageRank algorithms are supported:
- Classic PageRank: Computes ranking scores for the vertices using the network created by the incoming edges in the graph. Although it is intended for directed graphs, undirected graphs can be treated as well by converting them into directed graphs with reciprocated edges (that is, keeping the original edge and creating a second one going in the opposite direction).
- Approximate PageRank: Computes the ranking scores for the vertices in similar way to the classic algorithm without normalization and with a more relaxed convergence criteria since the tolerated error value is compared against each single vertex in the graph, instead of looking at the cumulative vertex error. Thus this variant will converge faster than the classic algorithm, but the ranking values might not be as accurate as in the classic implementation.
- Weighted PageRank: Similar to the classic PageRank algorithm, except that it allows for a weight value to be assigned to each edge. This weight determines the fraction of the PageRank score that will flow from the source vertex through the current edge to its destination vertex.
Example 16-6 Running the PageRank Algorithm
The following example runs the PageRank algorithm on BANK_GRAPH
to identify the top five accounts with the highest PageRank values. The PageRank algorithm uses
the following default values for the input parameters: error (tolerance = 0.001), damping factor
= 0.85, and maximum number of iterations = 100.
opg4j> var graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL)
graph ==> PgxGraph[name=BANK_GRAPH,N=1000,E=4996,created=1643308582055]
opg4j> var a = session.createAnalyst()
a ==> NamedArgumentAnalyst[session=4c054326-600d-47d3-ab40-36b41fa0e339]
opg4j> a.pagerank(graph, 0.001, 0.85, 100)
$12 ==> VertexProperty[name=pagerank,type=double,graph=BANK_GRAPH_PGQL]
opg4j> graph.queryPgql("SELECT DISTINCT m.id, m.pagerank FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.pagerank DESC LIMIT 5").print()
+-----------------------------+
| id | pagerank |
+-----------------------------+
| 387 | 0.0073028362522059255 |
| 406 | 0.0067344306145590786 |
| 135 | 0.006725965475577352 |
| 934 | 0.0066413407648344865 |
| 397 | 0.0057016075312134595 |
+-----------------------------+
$13 ==> PgqlResultSetImpl[graph=BANK_GRAPH_PGQL,numResults=5]
PgxGraph graph = session.readGraphByName("BANK_GRAPH",GraphSource.PG_PGQL);
Analyst a = session.createAnalyst();
a.pagerank(graph, 0.001, 0.85, 100);
PgqlResultSet rs = g1.queryPgql("SELECT DISTINCT m.id, m.pagerank FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.pagerank DESC LIMIT 5");
rs.print();
>>> graph = session.read_graph_by_name('BANK_GRAPH', 'pg_pgql')
>>> a = session.create_analyst()
>>> a.pagerank(graph, 0.001, 0.85, 100)
VertexProperty(name: pagerank, type: double, graph: BANK_GRAPH_PGQL)
>>> graph.query_pgql("SELECT DISTINCT m.id, m.pagerank FROM MATCH (m:accounts) -[e:transfers]-> (n:accounts) ORDER BY m.pagerank DESC LIMIT 5").print()
+-----------------------------+
| id | pagerank |
+-----------------------------+
| 387 | 0.0073028362522059255 |
| 406 | 0.0067344306145590786 |
| 135 | 0.006725965475577352 |
| 934 | 0.0066413407648344865 |
| 397 | 0.0057016075312134595 |
+-----------------------------+
Parent topic: Centrality Algorithms