Orthogonal Nonnegative Matrix Tri-factorization for Semi-supervised Document Co-clustering

Semi-supervised clustering is often viewed as using labeled data to aid the clustering process However, existing algorithms fail to consider dual constraints between data points (e.g documents) and features (e.g words) To address this problem, in this paper, we propose a novel semi-supervised document co-clustering model OSS-NMF via orthogonal nonnegative matrix tri-factorization Our model incorporates prior knowledge both on document and word side to aid the new word-category and document-cluster matrices construction Besides, we prove the correctness and convergence of our model to demonstrate its mathematical rigorous Our experimental evaluations show that the proposed document clustering model presents remarkable performance improvements with certain constraints.

[1]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[2]  Chris H. Q. Ding,et al.  Knowledge transformation from word space to document space , 2008, SIGIR '08.

[3]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[4]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[5]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[6]  Fei Wang,et al.  Semi-Supervised Clustering via Matrix Factorization , 2008, SDM.

[7]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[9]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[10]  Tao Li,et al.  A Non-negative Matrix Tri-factorization Approach to Sentiment Classification with Lexical Prior Knowledge , 2009, ACL.

[11]  Yanhua Chen,et al.  Semi-supervised Document Clustering with Simultaneous Text Representation and Categorization , 2009, ECML/PKDD.

[12]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[13]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.