Graphs: Motifs, Graphlets and Structural Roles in Networks

Networks and nodes can be characterised and compared by finding and profiling their network motifs (subgraphs), specifically induced subgraphs and graphlets (connected subgraphs). The structural roles of nodes can be learned in an unsupervised way via recursive node feature extraction and clustering, and these techniques can be combined with subgraph-based analysis.

We can use subgraphs, or equivalently subnetworks, to characterise and discriminate graphs, considering non-isomorphic subnetworks, i.e. graphs which are not just different visual representations of the same graph.

Graph Isomorphism

An isomorphism of graphs $G$ and $H$ is a bijection (a one-to-one mapping) between the vertex sets of $G$ and $H$: $f \colon V(G) \to V(H)$ such that any two vertices $u$ and $v$ of $G$ are adjacent in $G$ if and only if $f(u)$ and $f(v)$ are adjacent in $H$, i.e. the bijection is “edge-preserving”, in accordance with the general notion of isomorphism. (Isomorphism is a structure-preserving mapping between two structures of the same type that can be reversed by an inverse mapping.)

In order to analyse the graph, we’ll define a notion of significance using motifs and graphlets. We will use these motifs to build network significance profiles to compare graphs against each other.

Motifs and Graphlets

A graph motif is a recurring pattern of interconnections. Such a pattern is commonly a small induced subgraph¹ of the main graph, $G = (V, E)$, defined as a subset of nodes, $G[S]$, where edges exist between each pair of nodes in that subset.

Network Motifs

Examples of Three-Node Graph Motifs. [Image Source: Evidence for additional structure in real networks at Math Insight.]

Motifs could be things like feedforward connections (occurring in neural networks, for example skip-connections e.g. in ResNet²), parallel loops (food webs) or single-input modules (as occur in gene networks).

Note that when we count occurrence of motifs, we allow for overlap of parts of motifs instances we are enumerating. That is, some nodes could be shared across motif occurrences so long as the whole motif is different.

We define significance as a function of the expectation of occurrence of motifs in a graph, given its size. This calls for a null model, which we provide via generation of a random graph. We can generate our random graph intelligently to have similar characteristics as the graphs under study to isolate the significance along the dimensions (features of the graph) that we are interested in.

Practically, we define significance as a normalized z-score

\[Z_i = \dfrac{N_i^{real} - N_i^{rand}}{std(N_i^{rand})}\]

where $N_i^{real}$ and $N_i^{rand}$ are the number of occurrences of motif $i$ in the empirical and random graphs and $std(N_i^{rand})$ is the standard deviation of the latter.

The network significance profile, $\mathbf{SP}$, is a vector with elements, $SP_i$, for each motif, given by normalizing the vector of z-scores to unit length

\[SP_i = \dfrac{Z_i}{\sqrt{\sum_j Z_j^2}}\]

Use of z-scores appropriately normalizes for graphs of different sizes (although the second vector normalization to unit length is not necessary in general). Larger graphs tend to display greater z-scores³.

Generating Random Graphs for the Null Model

As mentioned, we can generate the random null model intelligently to capture more properties of the real graph. We’ll retain:

The number of nodes, $\vert V \vert = N$
The number of edges, $\vert E \vert$
The degree sequence, $k_1, k_2, \dots, k_N = \{k_i\}$

Configuration Model

Obtaining properties (1) and (2) are trivial but we can obtain property (3) using the Configuration Model, $G(N, \{k_i\})$, which represents a uniform distribution over all graphs with $N$ vertices conditionally on their having the $k$-degree sequence, $\{k_i\}$.

Configuration Model

Given a degree sequence, $k_1, k_2, \dots, k_N$, assign a degree, $k_i$ to each vertex producing spokes.

Randomly select two spokes and connect them.

Repeat (2) until running out of spokes.

A run of the configuration model is shown in the figure below where the array denotes the node ID and there is an entry for each spoke (“half-edge”) the node is given after step (1) in the algorithm.

Configuration Model

A run of the Configuration Model where spokes are assigned to nodes according to their degree and then a random graph is constructed by randomly connected spokes to form edges. [Image Source: Lecture 4 of Network Analysis and Modeling, CSCI 5352 by Prof. Aaron Clauset (2017)]

The soft configuration model for random graph generation does not guarantee the same degree sequence as given but note that even for the configuration model, disparities in this difference is exponentially decreasing in the size of the graph.

Properties of the Configuration Model

Some of the properties below omit small additive terms which disappear in the limit as the graph becomes large (e.g. as $\vert E \vert \rightarrow \infty$). Full derivations can be found in Lecture 4 of Network Analysis and Modeling, CSCI 5352 by Prof. Aaron Clauset (2017).

Probability $\mathbf{i}$ is connected to $\mathbf{j}$

For nodes $i$ and $j$, the probability $i$ is connected to $j$ is $\frac{1}{2m - 1}$ where $m$ is the number of edges, $\vert E \vert$. Since there are $k_i$ opportunities for this to occur, the probability that $i$ shares an edge, $(i, j)$, with $j$ is $\frac{k_i \cdot k_j}{2m - 1}$ or $\frac{k_i \cdot k_j}{2m}$ in the limit as $m \rightarrow \infty$.

\[P((i, j) \in E) = \dfrac{k_i \cdot k_j}{2m - 1} \approx \dfrac{k_i \cdot k_j}{2m}\]

Expected number of multi-edges

The probability of a second edge appearing between nodes $i$ and $j$ is $(k_i - 1)(k_j - 1) / 2m$ since we formed a first edge already with probability $k_i \cdot k_j / 2m$, so the probability that both are formed is $k_i k_j (k_i - 1) (k_j - 1) / (2m)^2$. When we sum this over distinct pairs of nodes, we get:

\[\mathbb{E}[\text{Number of Multi-edges}] = \frac{1}{2}\left [ \frac{\langle k^2 \rangle - \langle k \rangle}{\langle k \rangle} \right ]^2\]

where $\langle k \rangle$ represents the expected (mean) degree, $\mathbb{E}[k]$.

Expected number of self-loops

The logic for this is as for multi-edges except that we have $\binom{k_i}{2}$ ways of forming self-loops giving the probability of a self-loop as $p_{ii} = k_i \cdot (k_i - 1) / 4m$ and the from there the expected number of total self-loops over the graph as:

\[\mathbb{E}[\text{Number of Self-loops}] = \frac{\langle k^2 \rangle - \langle k \rangle}{2 \langle k \rangle}\]

Since neither of the latter two quantities, the expected number of multi-edges or self-loops, depends on the size of the graph, $N$, they are constant⁴ and therefore the fraction of multi-edges or self-loops is vanishingly small, $\mathcal{O}(1/N)$.

Switching

An alternative to the Configuration Model is Switching (“Rewiring”)

Switching

Given a graph, $G = (V, E)$

Repeat $Q \cdot \vert E \vert$ times:

Select a pair of edges randomly

Exchange the endpoints if no multiple edges or self-edges are generated

We choose $Q$ to be large enough for the process to converge (i.e. to if the graph were drawn from a random uniform distribution of graphs with that degree sequence). This yields a randomly rewired graph with an identical degree sequence to the original input graph, $G$.

We generate many random graphs (e.g. $100$, $1000$ or ) to accurately estimate the z-scores for hypothesis testing.

We can elaborate on these basic notions considering directed graphs, coloured nodes (i.e. those carrying class attributes) and considering temporal motifs (i.e. dynamically changing ones). Additionally, we can vary our notions of frequency and significance and place additional or different constraints on the null model.

Graphlets

A graphlet is a connected non-isomorphic subgraph.

Graphlets are used to provide node-level subgraph metrics and enable the generalisation of the notion of degree from the count of the number of edges a node has to the number of graphlets that it touches: the graphlet degree vector (GDV).

The graphlet degree vector takes into account the automorphism orbit, i.e. the ways in which a graph is self-symmetric⁵. This means that given a graphlet, we will record each distinct position with respect to particular orbits in the graphlet degree vector, as shown in the following figure.

Network Motifs

Visualisation of Graphlets, automorphism orbits, and Graphlet Degree Vectors (GDVs). Shown are All 9 graphlets with 2 3 and 4 nodes. [Image Source: Tijana Milenković et al. (2011) Dominating Biological Networks. PLoS One.]

Considering the figure, which shows the 9 possible graphlets that occur when considering up to 4 nodes, we obtain a vector in $\mathbb{Z}^{14}$ which constitutes a signature of a node describing the topology of its neighbourhood up to a maximal distance of 3 hops.

Graphlet Degree Vectors thus provide a metric of similarity between nodes.

Finding Motifs and Graphlets

Finding size-$k$ motifs and graphlets (subgraphs) is a two-step problem:

Enumerate all size-$k$ connected subgraphs
Count the number of occurrences of each subgraph type (with graph isomorphism test)

However, computing whether a subgraph exists in the graph is computationally difficult and finding out whether two subgraphs are isomorphic is NP-complete⁶. Computational complexity grows exponentially in the size of the subgraph so we usually limit the subgraph size to between 3 and 8 nodes.

NB See Ribeiro et al. (2019) A Survey on Subgraph Counting: Concepts, Algorithms and Applications to Network Motifs and Graphlets https://arxiv.org/abs/1910.13011.

Algorithms

Exact Subgraph Enumeration (ESU) (Wernicke 2006⁷)
Kavosh (Kashani et al. 2009⁸)
Subgraph sampling (Kashtan et al. 2004⁹)

Today we commonly use the Exact Subgraph Enumeration (ESU) algorithm.

The Exact Subgraph Enumeration Algorithm (ESU) (Wernicke 2006) for Finding Subgraphs

The Exact Subgraph Enumeration algorithm is a baseline approach for for subgraph counting.

The idea is to start at a node, $v$, and build the subgraph, $V_{subgraph}$ by adding nodes incrementally. We store candidates to add in an extension set, $V_{extension}$, to which we in turn add nodes, $u$, that satisfy two properties:

The node ID¹⁰ of $u$ must be greater than that of $v$
$u$ may be a neighbour of the extension set, $V_{extension}$, but not of the current subgraph, $V_{subgraph}$.

The algorithm is recursive and it builds an implicit tree-like structure of maximal depth $k$, which we call the ESU-Tree.

It expressed in pseudocode is as follows:

EnumerateSubgraphs($G, k$)
Input: A graph $G = (V, E)$ and an integer $1 \leq k \leq \vert V \vert$
Output: All size-$k$ subgraphs in $G$

for each vertex $v \in V$ do
1. $V_{extension} \leftarrow \{u \in N(\{v\}) \ \colon u > v \}$
2. call ExtendSubgraph($\{v\}, V_{extension}, v$)
return

ExtendSubgraph($V_{subgraph}, V_{extension}, v$)

if $\vert V_{subgraph} \vert = k$ then output $G[V_{subgraph}]$ and return
while $V_{extension} \neq \emptyset$ do
1. Remove an arbitrarily chosen vertex, $w$, from $V_{extension}$
2. $V_{extension}’ \leftarrow V_{extension} \cup \{u \in N_{excl}(w, V_{subgraph}) \ \colon \ u > v \}$
3. call ExtendSubgraph($V_{subgraph} \cup w, V_{extension}’, v$)
return

where $N_{excl}(w, V_{subgraph}) = N(w) \ V_{subgraph} \cup N(V_{subgraph})$ is the exclusive neighbourhood: all nodes neighbouring $w$ but not in $V_{subgraph}$ or $N(V_{subgraph})$.

The two stages of this algorithm can be understood intuitively as follows.

First, EnumerateSubgraphs starts at a vertex, $v$, and extends subgraphs from there. It does this for each vertex in the graph by calling the ExtendSubgraph subroutine.

ExtendSubgraph recursively extends subgraphs until it builds a subgraph of size $k$, our subgraph size input (covered by the if in line 1), OR runs out of nodes to extend with (the while condition in line 2). The crucial point is maintaining the extension set, $V_{extension}’$, which is done by maintaining the condition that added nodes have greater node IDs than the vertices already in the extension set.

Up to this point, we have simply enumerated the subgraphs. In order to count them, we have to identify isomorphic subgraphs and consolidate these. This is typically done with McKay’s NAUTY¹¹ Algorithm (McKay 1981¹²).

Structural Roles in Networks

Roles identify collections of nodes which have similar positions in a network and are based on the similarity of ties between subsets of nodes. Crucially, they do not need to interact direct or indirect with each other. Roles could be clique-members or periphery-nodes, for example.

This is in contrast to groups or communities in networks, which are based on adjacency, proximity or reachability from each other. (Groups and communities are more commonly used in data mining currently.)

As such, roles and communities are complementary with an analogy using universities being that roles would be e.g. professor or student and communities would be e.g. the political science department or the computer vision group.

Roles

Roles are useful for interpreting nodes in networks, searching for similar nodes or node classification. Importantly, roles generalize across disconnected networks whereas communities do not.

Roles could be defined in terms of structural equivalence, i.e. the property that two nodes, $u$ and $v$, have the same relationships to all other nodes, for example if two people had the same ties in a social network. Formally, nodes $u$ and $v$ are structurally equivalent if for all other nodes, $k$, $u$ shares an edge with $k$ iff $v$ shares an edge with $k$.

This definition is extremely restrictive, so we’ll use a more permissive approach.

RolX

RolX (from Role eXtraction) (Henderson et al. 2011¹³) is an unsupervised approach for automatic discovery (no priors) of roles for nodes that is linear in the number of edges of a graph: $\mathcal{O}(\vert E \vert)$. It assigns mixed membership to each node.

Schematically, RolX:

Takes as input a graph, $G = (V, E)$ as an adjacency matrix, $A \in \mathbb{R}^{\vert V \vert \times \vert V \vert}$
Recursively extracts features
Produces a node-by-feature matrix (of dimension $\mathbb{R}^{\vert V \vert \times d}$) where $d$ is the number of features
Performs role extraction using this matrix
Returns both (1) a node-by-role matrix ($\mathbb{R}^{\vert V \vert \times \vert R \vert}$) and (2) a role-by-feature matrix ($\mathbb{R}^{\vert R \vert \times d}$)
- Note this is effectively a decomposition of the intermediate node-feature matrix

The feature extraction’s recursivity is due to the fact that it first extracts local features, then those of the egonet (the network of the immediate neighbourhood, i.e. connections to and between neighbours) and then those of an increasingly large neighbourhood.

Local features include the degree, in-degree, out-degree and weighted degree, as applicable (undirected, directed; weighted).

Egonet features are the same properties also considering edges entering or leaving the egonet.

RolX works by using the current set of node features (starting with the base node features) to generate additional ones by employing the mean and sum aggregation functions (e.g. compute the mean value of the unweighted degree amongst neighbours of a node). It repeats this process on the newly generated features (including those resulting from the aggregations).

The number of features is therefore exponential in the number of iterations (i.e. hops from the starting node). To constrain the number of features to maintain the node-feature matrix in memory, we prune them by eliminated features that are correlated above a user-defined threshold.

Note that we could apply the same procedure but use the graphlet degrees as opposed to the node degrees.

Once the feature matrix is generated, RolX clusters nodes using non-negative matrix factorization¹⁴, performs model selection according to minimum description length (MDL) and uses the Kullback–Leibler divergence to measure likelihood.

To conclude, to contextualise this with an application, we can return to the university example and consider paper co-authorships where nodes are an abstraction for individuals and edges for coauthored papers.

We assign each person a distribution over the set of learned structural roles and determine similarity between people by comparing their role distributions. We might expect to see communities centered around professors (a learned structural role) who co-author papers with many collaborators and peripheral nodes being students (another learned structural role) who largely author papers only with their advisors.

References and Notes

This post is largely based on Lecture 3: Motifs and Structural Roles in Networks from Stanford CS224W - Machine Learning with Graphs, specifically the Fall 2019 Lectures. Further materials for study are signposted here.

The definition of a vertex-induced subgraph from Wolfram MathWorld is arguably clearer: A vertex-induced subgraph (sometimes simply called an “induced subgraph”) is a subset of the vertices of a graph $G$ together with any edges whose endpoints are both in this subset. ↩
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun (2015) Deep Residual Learning for Image Recognition https://arxiv.org/abs/1512.03385. See also ResNet Explained on Papers With Code for a pithy and simple explanation. ↩
As is the case in general for when we have larger sample sizes, we obtain smaller standard deviations and therefore larger significance metrics (here the z-score) making smaller effects easier to detect. ↩
If the first and second moments of the distribution producing $\{k_i\}$ are finite, which is not the case if the degree distribution follows a power law with $\alpha < 3$. ↩
Graph automorphism is the symmetry by which the graph is mapped onto itself while preserving the edge–vertex connectivity, i.e. what isomorphism is to two different graph objects, automorphism is to the same graph object. ↩
A computational problem is NP-complete when it can only be solved via brute-force search in more than polynomial time, $\mathcal{O}(n^k)$, where $n$ is the problem input size and $k$ is a constant but a provided solution to the problem can be verified in polynomial time. See this answer at https://stackoverflow.com/questions/210829/what-is-an-np-complete-in-computer-science. ↩
Sebastian Wernicke and Florian Rasche (2006) FANMOD: a tool for fast network motif detection. Bioinformatics. https://academic.oup.com/bioinformatics/article/22/9/1152/199945. According to Ribeiro et al. (2019) A Survey on Subgraph Counting: Concepts, Algorithms and Applications to Network Motifs and Graphlets https://arxiv.org/abs/1910.13011, the algorithm was originally proposed in Sebastian Wernicke. 2005. A faster algorithm for detecting network motifs. InWABI, Vol. 3692. Springer, 165–177, but it appears that the FANMOD paper is better known. ↩
Kashani et al. (2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinformatics. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-318. ↩
Kashtan et al. (2004) Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics. https://academic.oup.com/bioinformatics/article/20/11/1746/300212 ↩
The first condition for adding nodes to $V_{extension}$ (The node ID of $u$ must be greater than that of $v$) does not depend on a specific (permutation of) numbering of node IDs. All that is important is that node IDs are consistent because we run the algorithm for each node in $G$. ↩
Apparently from “No AUTomorphisms, Yes?” See https://www3.cs.stonybrook.edu/~algorith/implement/nauty/implement.shtml. ↩
B. D. McKay, Practical Graph Isomorphism, Congressus Numerantium, 30 (1981) 45-87 ↩
Henderson et al. (2011) RolX: Role Extraction and Mining in Large Networks. Lawrence Livermore National Laboratory. https://core.ac.uk/download/pdf/204821124.pdf. Perhaps more usefully, see Henderson et al. (2012) RolX: structural role extraction & mining in large graphs. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. https://dl.acm.org/doi/abs/10.1145/2339530.2339723 ↩
Any other clustering technique, such as $k$-means, could be used instead. ↩