A collection of resources and jottings on spectral clustering.
Spectral clustering is a way to carry out graph partitioning, namely a means of cutting a graph along some of its edges to form two or more disjoint sets of nodes. Spectral clustering is so called because it makes use of the spectrum of the graph’s Laplacian matrix - the set of the its eigenvalues - along with their associated eigenvectors. We will see that for the purposes of partitioning a graph, we are interested in the second smallest eigenvalue and its associated eigenvector, which is referred to as the Fiedler vector.
The task of graph partitioning generically requires answers to two questions:
- How do we define a good partition of a graph, ?
- How can we identify such a partition, and ideally compute it efficiently?
By contrast to the Louvain algorithm described in a [previous post]({{ site.baseurl }}{% link _posts/2021-04-06-graphs-communities.md %}#finding-communities-maximising-modularity-with-the-louvain-algorithm), which employs modularity as a metric for the extent of graph community structure, spectral clustering uses conductance, defined for a graph cut1, as a criterion for graph partitioning.
One final note before delving into the details is that a significant strength of spectral clustering over, for example, the Louvain heuristic is that it provides approximation guarantees for how well it clusters the graph.
Spectral Clustering
When doing spectral clustering, we define a weighted undirected graph, , with its vertex (node) set, , and edge set, , where a given edge, , connects adjacent nodes. A sufficient data structure to represent this graph is its adjacency matrix:
which consists of the weights between nodes and . If two nodes, and , have zero edge weight, , they are not connected and additionally since the graph, , is undirected, we require that the adjacency matrix is symmetric: . Nodes are represented by rows or columns in this representation of the graph object.
Further to this, we define the degree of a node, as the row sum2 of the row corresponding to
Conductance constitutes an objective function which we seek to minimise. It consists of the sum of weights of the cut edges normalized by the minimum volume of the clusters created, where volume is the sum of a component’s edge weights. This formulation is known as the Normalized Cut, or Ncut (Shi and Malik 20003) but other underlying criterion choices for spectral clustering exist including notably RatioCut (Hagen and Kahng 19924), which uses the size of components as a balancing criterion when making cuts.
Recapitulating Graph Fundamentals
Note that the unnormalized graph Laplacian does not depend on the diagonal elements of the adjacency matrix . Each adjacency matrix which coincides withWon all off-diagonal positions leads to the same unnormalized graph Laplacian . In particular, self-edges in a graph do not change the corresponding graph Laplacian.
Forthcoming
- **Incorporate spectral-clustering.ipynb notebook https://colab.research.google.com/drive/11x3zs7lvYBv-SAWKHybiii-BIA9pteQF?usp=sharing.**
- Compare / Unpack source code from sklearn implementation https://scikit-learn.org/stable/modules/generated/sklearn.cluster.SpectralClustering.html
- Perhaps apply spectral clustering to the Pandemic data as per this approach https://stackoverflow.com/questions/46258657/spectral-clustering-a-graph-in-python
- Compare this clustering approach to others https://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html#sphx-glr-auto-examples-cluster-plot-cluster-comparison-py
References and Notes
References cited in Jabri, Owens and Efros (2020) Space-Time Correspondence as a Contrastive Random Walk
- spectral clustering (emphasis on relevance to computer vision) [88,89,28]
- 88: Jianbo Shi and Jitendra Malik. Motion segmentation and tracking using normalized cuts.In Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), pages 1154–1160. IEEE, 1998. https://ieeexplore.ieee.org/document/710861
- 89: Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation.IEEE Transactionson pattern analysis and machine intelligence, 22(8):888–905, 2000
- 28: Charless Fowlkes, Serge Belongie, Fan Chung, and Jitendra Malik. Spectral grouping using the nystrom method. IEEE transactions on pattern analysis and machine intelligence, 26(2):214–225, 2004. https://people.eecs.berkeley.edu/~malik/papers/FBCM-nystrom.pdf.
- MRF/GraphCuts [10]: Yuri Boykov and Gareth Funka-Lea. Graph cuts and efficient nd image segmentation. International journal of computer vision, 70(2):109–131, 2006.
- Meila and Shi [67], which poses Normalized Cuts as a Markov random walk, describing an algorithm for learning an affinity function for segmentation by fitting the transition probabilities to be uniform within segments and zero otherwise.
- Meila, Marina and Shi, Jianbo. Learning segmentation by random walks. InAdvances inneural information processing systems, pages 873–879, 2001. https://proceedings.neurips.cc/paper/2000/file/069654d5ce089c13f642d19f09a3d1c0-Paper.pdf
For a comprehensive reference listing useful properties of vectors and matrices use The Matrix Cookbook (2012) put together by Kaare Brandt Petersen and Michael Syskind Pedersen.
CS267: Notes for Lecture 23, April 9, 1999 Graph Partitioning, Part 2
Proof of theorem (M. Fiedler, “A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory”, Czech. Math. J. 25:619—637, 1975.) Let G be connected, and N- and N+ be defined by the above algorithm. Then N- is connected. If no v2(n) = 0, N+ is also connected.
Ulrike von Luxburg (2007) A Tutorial on Spectral Clustering. Statistics and Computing, 17 (4). https://arxiv.org/pdf/0711.0189.pdf
Getting Started with Spectral Clustering (4th April 2020) by Juan Camilo Orduz
This post uses Lecture 5: Spectral Clustering of Stanford CS224W - Machine Learning with Graphs as a jumping off point. The lecture is part of those given in Fall 2019. Further generic materials for study of graph theory are signposted [here]({{ site.baseurl }}{% link learn.md %}#graphs).
Footnotes
-
The definition of conductance over a whole graph as opposed to a cut is the minimum conductance over all possible cuts. ↩
-
By symmetry of the adjacency matrix, , we could equivalently compute the degree as the column sum taking the summation over the row index, . ↩
-
Jianbo Shi and Jitendra Malik (2000) Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 8. https://repository.upenn.edu/cgi/viewcontent.cgi?article=1101&context=cis_papers. Note that this work was originally published three years earlier in Jianbo Shi and Jitendra Malik (1997) Normalized Cuts and Image Segmentation. Report No. UCB/CSD-97-940 May 1997 Computer Science Division (EECS) University of California Berkeley, California 94720. https://digitalassets.lib.berkeley.edu/techreports/ucb/text/CSD-97-940.pdf. ↩
-
Hagen and Kahng (1992) New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Volume: 11, Issue: 9, Sep 1992. http://snap.stanford.edu/class/cs224w-readings/hagen91spectral.pdf. ↩