Low-Rank Doubly Stochastic Matrix Decomposition for Cluster Analysis

Cluster analysis by nonnegative low-rank approximations has experienced a remarkable progress in the past decade. However, the majority of such approximation approaches are still restricted to nonnegative matrix factorization (NMF) and suffer from the following two drawbacks: 1) they are unable to produce balanced partitions for large-scale manifold data which are common in real-world clustering tasks; 2) most existing NMF-type clustering methods cannot automatically determine the number of clusters. We propose a new low-rank learning method to address these two problems, which is beyond matrix factorization. Our method approximately decomposes a sparse input similarity in a normalized way and its objective can be used to learn both cluster assignments and the number of clusters. For efficient optimization, we use a relaxed formulation based on Data-Cluster-Data random walk, which is also shown to be equivalent to low-rank factorization of the doubly-stochastically normalized cluster incidence matrix. The probabilistic cluster assignments can thus be learned with a multiplicative majorization-minimization algorithm. Experimental results show that the new method is more accurate both in terms of clustering large-scale manifold data sets and of selecting the number of clusters.

[1]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[2]  Richard Sinkhorn,et al.  Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[3]  Chris H. Q. Ding,et al.  Nonnegative Matrix Factorization for Combinatorial Optimization: Spectral Clustering, Graph Matching, and Clique Finding , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[4]  Xavier Bresson,et al.  Multiclass Total Variation Clustering , 2013, NIPS.

[5]  Mohammed J. Zaki,et al.  Clusterability Detection and Initial Seed Selection in Large Data Sets , 1999 .

[6]  Shai Ben-David,et al.  The Computational Complexity of Densest Region Detection , 2002, J. Comput. Syst. Sci..

[7]  Jorma Laaksonen,et al.  Multiplicative updates for non-negative projections , 2007, Neurocomputing.

[8]  René Vidal,et al.  Sparse Manifold Clustering and Embedding , 2011, NIPS.

[9]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Maria-Florina Balcan,et al.  Approximate clustering without the approximation , 2009, SODA.

[11]  T. Minka Estimating a Dirichlet distribution , 2012 .

[12]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[13]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[14]  Erkki Oja,et al.  Quadratic nonnegative matrix factorization , 2012, Pattern Recognit..

[15]  Inderjit S. Dhillon,et al.  Matrix Nearness Problems with Bregman Divergences , 2007, SIAM J. Matrix Anal. Appl..

[16]  Miguel Á. Carreira-Perpiñán,et al.  Entropic Affinities: Properties and Efficient Numerical Computation , 2013, ICML.

[17]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[18]  Erkki Oja,et al.  Learning the Information Divergence , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[20]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[21]  Santosh S. Vempala,et al.  A discriminative framework for clustering via similarity functions , 2008, STOC.

[22]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[23]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[24]  Fei Wang,et al.  Improving clustering by learning a bi-stochastic data similarity matrix , 2011, Knowledge and Information Systems.

[25]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[26]  Maya R. Gupta,et al.  Clustering by Left-Stochastic Matrix Factorization , 2011, ICML.

[27]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[28]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[29]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[30]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[31]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[32]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Janne Sinkkonen,et al.  Component models for large networks , 2008, 0803.1628.

[34]  Nicolas Gillis,et al.  ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering , 2013, ESANN.

[35]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[36]  Maya R. Gupta,et al.  Similarity-based clustering by left-stochastic matrix factorization , 2013, J. Mach. Learn. Res..

[37]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[38]  J. G. Skellam,et al.  A New Method for determining the Type of Distribution of Plant Individuals , 1954 .

[39]  Xinlei Chen,et al.  Large Scale Spectral Clustering Via Landmark-Based Sparse Representation , 2015, IEEE Transactions on Cybernetics.

[40]  Erkki Oja,et al.  Linear and Nonlinear Projective Nonnegative Matrix Factorization , 2010, IEEE Transactions on Neural Networks.

[41]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[42]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Erkki Oja,et al.  Multiplicative Updates for Learning with Stochastic Matrices , 2013, SCIA.

[45]  Erkki Oja,et al.  Clustering by Low-Rank Doubly Stochastic Matrix Decomposition , 2012, ICML.

[46]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[47]  Nitish Srivastava,et al.  Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[48]  Erkki Oja,et al.  Unified Development of Multiplicative Algorithms for Linear and Quadratic Nonnegative Matrix Factorization , 2011, IEEE Transactions on Neural Networks.

[49]  Shai Ben-David,et al.  Clusterability: A Theoretical Study , 2009, AISTATS.

[50]  Seungjin Choi,et al.  Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds , 2008, IDEAL.

[51]  Jorma Laaksonen,et al.  Projective Non-Negative Matrix Factorization with Applications to Facial Image Processing , 2007, Int. J. Pattern Recognit. Artif. Intell..

[52]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[53]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[54]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[55]  Meena Mahajan,et al.  The Planar k-means Problem is NP-hard I , 2009 .

[56]  Amnon Shashua,et al.  Doubly Stochastic Normalization for Spectral Clustering , 2006, NIPS.

[57]  Dacheng Tao,et al.  On the Performance of Manhattan Nonnegative Matrix Factorization , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[58]  Matthias Hein,et al.  An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA , 2010, NIPS.

[59]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[60]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[61]  Nitish Srivastava,et al.  Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[62]  Zhaoshui He,et al.  Symmetric Nonnegative Matrix Factorization: Algorithms and Applications to Probabilistic Clustering , 2011, IEEE Transactions on Neural Networks.

[63]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[64]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[65]  Ehsan Elhamifar,et al.  Sparse subspace clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.