Spectral Clustering

Spectral clustering aims to partition the data points into $k$ clusters using the spectrum of the graph Laplacians Given a dataset $X$ with $N$ data points, spectral clustering algorithm first constructs similarity matrix ${W}$ , where ${w_{ij}}$ indicates the similarity between data points $x_i$ and $x_j$ via a similarity measure metric.

Let $L=D-W$ , where $L$ is called graph Laplacian and ${D}$ is a diagonal matrix with $d_{ii} = \sum_ {j=1}^n w_{ij}$ . The objective function of spectral clustering can be formulated based on the graph Laplacian as follow: \begin{equation} \label{eq:SC_obj} {\max_{{U}} \operatorname{tr}\left({U}^{T} {L} {U}\right)}, \ {\text { s.t. } \quad {U}^{T} {{U}={I}}}, \end{equation} where $\operatorname{tr(\cdot)}$ denotes the trace norm of a matrix. The rows of matrix ${U}$ are the low dimensional embedding of the original data points. Generally, spectral clustering computes ${U}$ as the bottom $k$ eigenvectors of ${L}$ , and finally applies $k$ -means on ${U}$ to obtain the clustering results.

Large-scale Spectral Clustering

To capture the relationship between all data points in $X$ , an $N\times N$ similarity matrix is needed to be constructed in conventional spectral clustering, which costs $O(N^2d)$ time and $O(N^2)$ memory and is not feasible for large-scale clustering tasks. Instead of a full similarity matrix, many accelerated spectral clustering methods are using a similarity sub-matrix to represent each data points by the cross-similarity between data points and a set of representative data points (i.e., landmarks) via some similarity measures, as \begin{equation} \label{eq: cross-similarity} B = \Phi(X,R), \end{equation} where $R = \{r_1,r_2,\dots, r_p \}$ ( $p \ll N$ ) is a set of landmarks with the same dimension to $X$ , $\Phi(\cdot)$ indicate a similarity measure metric, and $B\in \mathbb{R}^{N\times p}$ is the similarity sub-matrix to represent the $X \in \mathbb{R}^{N\times d}$ with respect to the $R\in \mathbb{R}^{p\times d}$ .

For large-scale spectral clustering using such similarity matrix, a symmetric similarity matrix $W$ can be designed as \begin{equation} \label{eq: WusedB } W=\left[\begin{array}{ll} \mathbf{0} & B ; \ B^{T} & \mathbf{0} \end{array}\right]. \end{equation} The size of matrix $W$ is $(N+p)\times (N+p)$ . Taking the advantage of the bipartite structure, some fast eigen-decomposition methods can then be used to obtain the spectral embedding. Finally, $k$ -means is conducted on the embedding to obtain clustering results.

The clustering result is directly related to the quality of $B$ that consists of the similarities between data points and landmarks. Thus, the performance of landmark selection is crucial to the clustering result.

Source	Divide-and-conquer based Large-Scale Spectral Clustering
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com