Non-negative and sparse spectral clustering

Spectral clustering aims to partition a data set into several groups by using the Laplacian of the graph such that data points in the same group are similar while data points in different groups are dissimilar to each other. Spectral clustering is very simple to implement and has many advantages over the traditional clustering algorithms such as k-means. Non-negative matrix factorization (NMF) factorizes a non-negative data matrix into a product of two non-negative (lower rank) matrices so as to achieve dimension reduction and part-based data representation. In this work, we proved that the spectral clustering under some conditions is equivalent to NMF. Unlike the previous work, we formulate the spectral clustering as a factorization of data matrix (or scaled data matrix) rather than the symmetrical factorization of the symmetrical pairwise similarity matrix as the previous study did. Under the NMF framework, where regularization can be easily incorporated into the spectral clustering, we propose several non-negative and sparse spectral clustering algorithms. Empirical studies on real world data show much better clustering accuracy of the proposed algorithms than some state-of-the-art methods such as ratio cut and normalized cut spectral clustering and non-negative Laplacian embedding. HighlightsWe proved that the spectral clustering is equivalent to NMF.Under the NMF framework, we propose nonnegative sparse spectral clustering model.We propose several algorithms to solve the proposed models.Experiment results on real world data show much better performance.

[1]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[4]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[5]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[6]  Chung-Kuan Cheng,et al.  An improved two-way partitioning algorithm with stable performance [VLSI] , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[7]  Chris H. Q. Ding,et al.  Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence Chi-Square Statistic, and a Hybrid Method , 2006, AAAI.

[8]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[9]  Christian Bauckhage,et al.  Convex Non-negative Matrix Factorization in the Wild , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[10]  Rong Jin,et al.  Learning from Noisy Side Information by Generalized Maximum Entropy Model , 2010, ICML.

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Martine D. F. Schlag,et al.  Spectral K-way ratio-cut partitioning and clustering , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[16]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Michael I. Jordan,et al.  Learning Spectral Clustering, With Application To Speech Separation , 2006, J. Mach. Learn. Res..

[18]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[19]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[20]  Michael E. Tipping Sparse Kernel Principal Component Analysis , 2000, NIPS.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[23]  Multiway Spectral Clustering: A Margin-based Perspective , 2008, 1102.3768.

[24]  Hideo Goto,et al.  Selective incorporation of influenza virus RNA segments into virions , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[26]  Amnon Shashua,et al.  Nonnegative Sparse PCA , 2006, NIPS.

[27]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[28]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[29]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[30]  Chris H. Q. Ding,et al.  Non-negative Laplacian Embedding , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[31]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[33]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..