Constrained neighborhood preserving concept factorization for data representation

Matrix factorization based techniques, such as nonnegative matrix factorization (NMF) and concept factorization (CF), have attracted a great deal of attentions in recent years, mainly due to their ability of dimension reduction and sparse data representation. Both techniques are of unsupervised nature and thus do not make use of a priori knowledge to guide the clustering process. This could lead to inferior performance in some scenarios. As a remedy to this, a semi-supervised learning method called Pairwise Constrained Concept Factorization (PCCF) was introduced to incorporate some pairwise constraints into the CF framework. Despite its improved performance, PCCF uses only a priori knowledge and neglects the proximity information of the whole data distribution; this could lead to rather poor performance (although slightly improved comparing to CF) when only limited a priori information is available. To address this issue, we propose in this paper a novel method called Constrained Neighborhood Preserving Concept Factorization (CNPCF). CNPCF utilizes both a priori knowledge and local geometric structure of the dataset to guide its clustering. Experimental studies on three real-world clustering tasks demonstrate that our method yields a better data representation and achieves much improved clustering performance in terms of accuracy and mutual information comparing to the state-of-the-arts techniques.

[1]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Yong Qi,et al.  Updating multigranulation rough approximations with increasing of granular structures , 2014, Knowl. Based Syst..

[3]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[4]  Nozha Boujemaa,et al.  Semi-Supervised Fuzzy Clustering with Pairwise-Constrained Competitive Agglomeration , 2005, The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ '05..

[5]  Stan Z. Li,et al.  Learning spatially localized, parts-based representation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Xiaofei He,et al.  Discriminative concept factorization for data representation , 2011, Neurocomputing.

[7]  Zhaohui Wu,et al.  Constrained Concept Factorization for Image Representation , 2014, IEEE Transactions on Cybernetics.

[8]  Seungjin Choi,et al.  Semi-Supervised Nonnegative Matrix Factorization , 2010, IEEE Signal Processing Letters.

[9]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[10]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Chuang Liu,et al.  Multi-linear interactive matrix factorization , 2014, Knowl. Based Syst..

[12]  Ran He,et al.  Nonnegative sparse coding for discriminative semi-supervised learning , 2011, CVPR 2011.

[13]  Fanzhang Li,et al.  Semi-supervised concept factorization for document clustering , 2016, Inf. Sci..

[14]  Jiawei Han,et al.  Locally Consistent Concept Factorization for Document Clustering , 2011, IEEE Transactions on Knowledge and Data Engineering.

[15]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[16]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[17]  Dit-Yan Yeung,et al.  Semi-Supervised Discriminant Analysis using robust path-based similarity , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[19]  Fernando Ortega,et al.  A non negative matrix factorization for collaborative filtering recommender systems based on a Bayesian probabilistic model , 2016, Knowl. Based Syst..

[20]  Fei Wang,et al.  Semi-Supervised Clustering via Matrix Factorization , 2008, SDM.

[21]  Jason J. Jung,et al.  Exploiting matrix factorization to asymmetric user similarities in recommendation systems , 2015, Knowl. Based Syst..

[22]  Hongtao Lu,et al.  Pairwise constrained concept factorization for data representation , 2014, Neural Networks.

[23]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[24]  Haesun Park,et al.  Sparse Nonnegative Matrix Factorization for Clustering , 2008 .

[25]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[26]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[27]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[28]  Qingyao Wu,et al.  NMFE-SSCC: Non-negative matrix factorization ensemble for semi-supervised collective classification , 2015, Knowl. Based Syst..

[29]  Yihong Gong,et al.  Document clustering by concept factorization , 2004, SIGIR '04.

[30]  Corrado Mencar,et al.  Subtractive clustering for seeding non-negative matrix factorizations , 2014, Inf. Sci..

[31]  Rong Jin,et al.  Semi-supervised Learning with Weakly-Related Unlabeled Data: Towards Better Text Categorization , 2008, NIPS.

[32]  Xuelong Li,et al.  Constrained Nonnegative Matrix Factorization for Image Representation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Amy Nicole Langville,et al.  Algorithms, Initializations, and Convergence for the Nonnegative Matrix Factorization , 2014, ArXiv.