Pairwise Constraints-Guided Non-negative Matrix Factorization for Document Clustering

Nonnegative Matrix Factorization (NMF) has been proven to be effective in text mining. However, since NMF is a well-known unsupervised components analysis technique, the existing NMF method can not deal with prior constraints, which are beneficial to clustering or classification tasks. In this paper, we address the text clustering problem via a novel strategy, called Pairwise Constraintsguided Non-negative Matrix Factorization (PCNMF for short). Differing from the traditional NMF method, the proposed method can capture the available abundance prior constraints in original space, which result in more effective for clustering or information retrieval. Therefore, PCNMF enforces the discriminative capability in the reduced space. Utilizing the appropriate transformation, PCNMF represents as a new optimization problem, which can be efficiently solved by an iterative approach. The cluster membership of each document can be easily determined as the standard NMF. Empirical studies based on Benchmark document corpus demonstrate appealing results.

[1]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[2]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[3]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[4]  Weiyi Meng,et al.  A Query-based System for Automatic Invocation of Web Services , 2007, IEEE International Conference on Web Services (ICWS 2007).

[5]  Qiang Yang,et al.  Detect and Track Latent Factors with Online Nonnegative Matrix Factorization , 2007, IJCAI.

[6]  Steven Skiena,et al.  Spatial Analysis of News Sources , 2006, IEEE Transactions on Visualization and Computer Graphics.

[7]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[8]  Jennifer Chu-Carroll,et al.  Semantic search via XML fragments: a high-precision approach to IR , 2006, SIGIR.

[9]  Chris H. Q. Ding,et al.  Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence Chi-Square Statistic, and a Hybrid Method , 2006, AAAI.

[10]  Steven Skiena,et al.  Identifying Co-referential Names Across Large Corpora , 2006, CPM.

[11]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[12]  Dietrich Lehmann,et al.  Nonsmooth nonnegative matrix factorization (nsNMF) , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[14]  Christoph Schnörr,et al.  Learning non-negative sparse image codes by convex programming , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Michael W. Berry,et al.  Email Surveillance Using Non-negative Matrix Factorization , 2005, Comput. Math. Organ. Theory.

[16]  Tanveer F. Syeda-Mahmood,et al.  Searching service repositories by combining semantic and ontological matching , 2005, IEEE International Conference on Web Services (ICWS'05).

[17]  Francesco Romani,et al.  Ranking a stream of news , 2005, WWW '05.

[18]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[19]  Amit P. Sheth,et al.  Meteor-s web service annotation framework , 2004, WWW '04.

[20]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[21]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[22]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[23]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[24]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[25]  Monika Henzinger,et al.  Query-Free News Search , 2003, WWW '03.

[26]  Ian Horrocks,et al.  A software framework for matchmaking based on semantic web technology , 2003, WWW '03.

[27]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[28]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[29]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[30]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[31]  Stan Z. Li,et al.  Learning spatially localized, parts-based representation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[32]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[33]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[34]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[35]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..