论文信息 - Non-negative matrix factorization for semi-supervised data clustering

Non-negative matrix factorization for semi-supervised data clustering

Traditional clustering algorithms are inapplicable to many real-world problems where limited knowledge from domain experts is available. Incorporating the domain knowledge can guide a clustering algorithm, consequently improving the quality of clustering. In this paper, we propose SS-NMF: a semi-supervised non-negative matrix factorization framework for data clustering. In SS-NMF, users are able to provide supervision for clustering in terms of pairwise constraints on a few data objects specifying whether they “must” or “cannot” be clustered together. Through an iterative algorithm, we perform symmetric tri-factorization of the data similarity matrix to infer the clusters. Theoretically, we show the correctness and convergence of SS-NMF. Moveover, we show that SS-NMF provides a general framework for semi-supervised clustering. Existing approaches can be considered as special cases of it. Through extensive experiments conducted on publicly available datasets, we demonstrate the superior performance of SS-NMF for clustering.

[1] Dan Klein,et al. Spectral Learning , 2003, IJCAI.

[2] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[3] C. Ding,et al. On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[4] Ran El-Yaniv,et al. A New Nonparametric Pairwise Clustering Algorithm Based on Iterative Estimation of Distance Profiles , 2004, Machine Learning.

[5] Abhay Harpale,et al. Document Classification Through Interactive Supervision of Document and Term Labels , 2004, PKDD.

[6] Hisashi Koga,et al. Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing , 2007, Knowledge and Information Systems.

[7] Venkatesan Guruswami,et al. Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[8] Christopher J. Merz,et al. UCI Repository of Machine Learning Databases , 1996 .

[9] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[10] Chris H. Q. Ding,et al. Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[11] Wendy R. Fox,et al. Finding Groups in Data: An Introduction to Cluster Analysis , 1991 .

[12] R. Cox,et al. Journal of the Royal Statistical Society B , 1972 .

[13] Xin Liu,et al. Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[14] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[15] Inderjit S. Dhillon,et al. Semi-supervised graph clustering: a kernel approach , 2005, ICML '05.

[16] Daniel A. Keim,et al. A General Approach to Clustering in Large Databases with Noise , 2003, Knowledge and Information Systems.

[17] D. J. Newman,et al. UCI Repository of Machine Learning Database , 1998 .

[18] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20] M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory , 1975 .

[21] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[22] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[23] Tomer Hertz,et al. Learning Distance Functions using Equivalence Relations , 2003, ICML.

[24] Arindam Banerjee,et al. Semi-supervised Clustering by Seeding , 2002, ICML.

[25] Fan Chung,et al. Spectral Graph Theory , 1996 .

[26] Daniel Boley,et al. Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[27] Sebastian Thrun,et al. Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[28] Philip S. Yu,et al. Co-clustering by block value decomposition , 2005, KDD '05.

[29] Philip S. Yu,et al. Text Classification by Labeling Words , 2004, AAAI.

[30] David G. Stork,et al. Pattern Classification , 1973 .