Clustering analysis using manifold kernel concept factorization

Abstract Various exponential-growing documents and images have become omnipresent in past decades, and it is of vital importance to group them into clusters upon desired. Matrix factorization is exhibited to help yield encouraging clustering results in previous works, whereas the data manifold structure, which holds plentiful spatial model information, is not fully respected by most existing techniques. And kernel learning is advantageous for unfolding nonlinear structure. Therefore, in this paper we propose a novel clustering approach called Manifold Kernel Concept Factorization (MKCF) that incorporates the manifold kernel learning in concept factorization, which encodes the local geometrical structure in the kernel space. This method efficiently preserves the data semantic structure using graph Laplacian, and the nonlinear manifold learning in the warped RKHS potentially reflects the underlying local geometry of the data. Thus, the concepts consistent with the intrinsic manifold structure are well extracted, and this greatly benefits aggregating documents and images within the same concept into the same cluster. Extensive empirical studies demonstrate that MKCF owns the superiority of achieving the more satisfactory clustering performance as well as deriving the better-represented lower data space.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Wenhua Wang,et al.  Local and Global Regressive Mapping for Manifold Learning with Out-of-Sample Extrapolation , 2010, AAAI.

[3]  David B. Dunson,et al.  A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation , 2009, NIPS.

[4]  Fakhri Karray,et al.  An Efficient Concept-Based Mining Model for Enhancing Text Clustering , 2010, IEEE Transactions on Knowledge and Data Engineering.

[5]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[6]  Hujun Bao,et al.  Understanding the Power of Clause Learning , 2009, IJCAI.

[7]  George Karypis,et al.  Concept Indexing: A Fast Dimensionality Reduction Algorithm With Applications to Document Retrieval and Categorization , 2000 .

[8]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Yuan Yan Tang,et al.  Document Clustering in Correlation Similarity Measure Space , 2012, IEEE Transactions on Knowledge and Data Engineering.

[10]  Rongrong Ji,et al.  Nonnegative Spectral Clustering with Discriminative Regularization , 2011, AAAI.

[11]  Mikhail Belkin,et al.  Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[12]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[13]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[14]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[15]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[16]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[17]  Jiawei Han,et al.  Locally Consistent Concept Factorization for Document Clustering , 2011, IEEE Transactions on Knowledge and Data Engineering.

[18]  Yihong Gong,et al.  Document clustering by concept factorization , 2004, SIGIR '04.

[19]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[20]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[21]  Wei-Ying Ma,et al.  Locality preserving clustering for image database , 2004, MULTIMEDIA '04.

[22]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Deng Cai,et al.  Manifold Adaptive Experimental Design for Text Categorization , 2012, IEEE Transactions on Knowledge and Data Engineering.

[24]  Daniel D. Lee,et al.  Multiplicative Updates for Nonnegative Quadratic Programming , 2007, Neural Computation.

[25]  Jiawei Han,et al.  Non-negative Matrix Factorization on Manifold , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[26]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[27]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[28]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[29]  Xianchao Zhang,et al.  Exploiting constraint inconsistence for dimension selection in subspace clustering: A semi-supervised approach , 2011, Neurocomputing.

[30]  Yi Yang,et al.  Image Clustering Using Local Discriminant Models and Global Integration , 2010, IEEE Transactions on Image Processing.