Simultaneous Semi-NMF and PCA for Clustering

Cluster analysis is often carried out in combination with dimension reduction. The Semi-Non-negative Matrix Factorization (Semi-NMF) that learns a low-dimensional representation of a data set lends itself to a clustering interpretation. In this work we propose a novel approach to finding an optimal subspace of multi-dimensional variables for identifying a partition of the set of objects. The use of a low-dimensional representation can be of help in providing simpler and more interpretable solutions. We show that by doing so, our model is able to learn low-dimensional representations that are better suited for clustering, outperforming not only Semi-NMF, but also other NMF variants.

[1]  Martine D. F. Schlag,et al.  Spectral K-way ratio-cut partitioning and clustering , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[2]  Chris H. Q. Ding,et al.  Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence Chi-Square Statistic, and a Hybrid Method , 2006, AAAI.

[3]  Haesun Park,et al.  Sparse Nonnegative Matrix Factorization for Clustering , 2008 .

[4]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[5]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[6]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[7]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[8]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[9]  Amnon Shashua,et al.  A unifying approach to hard and probabilistic clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Wei-Chien Chang On using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions , 1983 .

[12]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[13]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[14]  Tao Li,et al.  The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[15]  P. Arabie,et al.  Cluster analysis in marketing research , 1994 .

[16]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[17]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[18]  Quanquan Gu,et al.  Co-clustering on manifolds , 2009, KDD.