Fast in-memory spectral clustering using a fixed-size approach

Spectral clustering represents a successful approach to data clustering. Despite its high performance in solving complex tasks, it is often disregarded in favor of the less accurate k-means algorithm because of its computational inefficiency. In this article we present a fast in-memory spectral clustering algorithm, which can handle millions of datapoints at a desktop PC scale. The proposed technique relies on a kernel-based formulation of the spectral clustering problem, also known as kernel spectral clustering. In particular, we use a fixed-size approach based on an approximation of the feature map via the Nystrom method to solve the primal optimization problem. We experimented on several small and large scale real-world datasets to show the computational efficiency and clustering quality of the proposed algorithm.

[1]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[4]  M. Cugmas,et al.  On comparing partitions , 2015 .

[5]  William W. Cohen,et al.  Power Iteration Clustering , 2010, ICML.

[6]  Johan A. K. Suykens,et al.  Self-tuned kernel spectral clustering for large scale networks , 2013, 2013 IEEE International Conference on Big Data.

[7]  Adil M. Bagirov,et al.  An incremental clustering algorithm based on hyperbolic smoothing , 2015, Comput. Optim. Appl..

[8]  Yihong Gong,et al.  Incremental Spectral Clustering With Application to Monitoring of Evolving Blog Communities , 2007, SDM.

[9]  C. Lintott,et al.  Galaxy Zoo 1: data release of morphological classifications for nearly 900 000 galaxies , 2010, 1007.3265.

[10]  Johan A. K. Suykens,et al.  Optimal reduced sets for sparse kernel spectral clustering , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[11]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[12]  Johan A. K. Suykens,et al.  Kernel Spectral Clustering for Big Data Networks , 2013, Entropy.

[13]  Johan A. K. Suykens,et al.  Multiway Spectral Clustering with Out-of-Sample Extensions through Weighted Kernel PCA , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Hongjie Jia,et al.  The latest research progress on spectral clustering , 2013, Neural Computing and Applications.

[15]  G. Micula,et al.  Numerical Treatment of the Integral Equations , 1999 .

[16]  Johan A. K. Suykens,et al.  Optimized fixed-size kernel models for large data sets , 2010, Comput. Stat. Data Anal..

[17]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[19]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[20]  Johan A. K. Suykens,et al.  Incremental kernel spectral clustering for online learning of non-stationary data , 2014, Neurocomputing.

[21]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[22]  U. Feige,et al.  Spectral Graph Theory , 2015 .