Non-negative matrix factorization for semi-supervised data clustering

Traditional clustering algorithms are inapplicable to many real-world problems where limited knowledge from domain experts is available. Incorporating the domain knowledge can guide a clustering algorithm, consequently improving the quality of clustering. In this paper, we propose SS-NMF: a semi-supervised non-negative matrix factorization framework for data clustering. In SS-NMF, users are able to provide supervision for clustering in terms of pairwise constraints on a few data objects specifying whether they “must” or “cannot” be clustered together. Through an iterative algorithm, we perform symmetric tri-factorization of the data similarity matrix to infer the clusters. Theoretically, we show the correctness and convergence of SS-NMF. Moveover, we show that SS-NMF provides a general framework for semi-supervised clustering. Existing approaches can be considered as special cases of it. Through extensive experiments conducted on publicly available datasets, we demonstrate the superior performance of SS-NMF for clustering.

[1]  Dan Klein,et al.  Spectral Learning , 2003, IJCAI.

[2]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[3]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[4]  Ran El-Yaniv,et al.  A New Nonparametric Pairwise Clustering Algorithm Based on Iterative Estimation of Distance Profiles , 2004, Machine Learning.

[5]  Abhay Harpale,et al.  Document Classification Through Interactive Supervision of Document and Term Labels , 2004, PKDD.

[6]  Hisashi Koga,et al.  Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing , 2007, Knowledge and Information Systems.

[7]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[8]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[9]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[10]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[11]  Wendy R. Fox,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1991 .

[12]  R. Cox,et al.  Journal of the Royal Statistical Society B , 1972 .

[13]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[14]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[15]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, ICML '05.

[16]  Daniel A. Keim,et al.  A General Approach to Clustering in Large Databases with Noise , 2003, Knowledge and Information Systems.

[17]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[18]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  M. Fiedler A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory , 1975 .

[21]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[22]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[23]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[24]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[25]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[26]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[27]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[28]  Philip S. Yu,et al.  Co-clustering by block value decomposition , 2005, KDD '05.

[29]  Philip S. Yu,et al.  Text Classification by Labeling Words , 2004, AAAI.

[30]  David G. Stork,et al.  Pattern Classification , 1973 .

[31]  Xiang Ji,et al.  Document clustering with prior knowledge , 2006, SIGIR.

[32]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[33]  Tom M. Mitchell,et al.  Text clustering with extended user feedback , 2006, SIGIR.

[34]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[35]  M. Fiedler Eigenvectors of acyclic matrices , 1975 .

[36]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[37]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[38]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[39]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[40]  Philip S. Yu,et al.  Relational clustering by symmetric convex coding , 2007, ICML '07.

[41]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[42]  George Karypis,et al.  Centroid-Based Document Classification: Analysis and Experimental Results , 2000, PKDD.

[43]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[44]  Steffen Staab,et al.  Text Clustering Based on Background Knowledge , 2003 .

[45]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[46]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[47]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[48]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[49]  Sang-goo Lee,et al.  An Intelligent Information System for Organizing Online Text Documents , 2004, Knowl. Inf. Syst..

[50]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[51]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[52]  Tao Li,et al.  The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[53]  Hema Raghavan,et al.  InterActive Feature Selection , 2005, IJCAI.

[54]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[55]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[56]  Thomas Hofmann,et al.  Non-redundant data clustering , 2006, Knowledge and Information Systems.

[57]  David D. Lewis,et al.  Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .

[58]  Ellen Riloff Bootstrapping for text learning tasks , 1999 .