Semi-supervised Hierarchical Co-clustering

Hierarchical co-clustering aims at generating dendrograms for the rows and columns of the input data matrix. The limitation of using simple hierarchical co-clustering for document clustering is that it has a lot of feature terms and documents, and it also ignores the semantic relations between feature terms. In this paper a semi-supervised clustering algorithm is proposed for hierarchical co-clustering. In the first step feature terms are clustered using a little supervised information. In the second step, the feature terms are merged as new feature attributes. And in the last step, the documents and merged feature terms are clustered using hierarchical co-clustering algorithm. Semantic information is used to measure the similarity during the hierarchical co-clustering process. Experimental results show that the proposed algorithm is effective and efficient.

[1]  Mohamed S. Kamel,et al.  Topic Discovery from Document Using Ant-Based Clustering Combination , 2005, APWeb.

[2]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[3]  Furu Wei,et al.  Constrained Text Coclustering with Supervised and Unsupervised Constraints , 2013, IEEE Transactions on Knowledge and Data Engineering.

[4]  Yunming Ye,et al.  Feature Weighting Information-Theoretic Co-Clustering for Document Clustering , 2009, 2009 2nd International Conference on Computer Science and its Applications.

[5]  Tao Li,et al.  Hierarchical Co-Clustering: A New Way to Organize the Music Data , 2012, IEEE Transactions on Multimedia.

[6]  Tao Li,et al.  HCC: a hierarchical co-clustering algorithm , 2010, SIGIR '10.

[7]  Joydeep Ghosh,et al.  A framework for simultaneous co-clustering and learning from complex data , 2007, KDD '07.

[8]  Chang-Dong Wang,et al.  A Novel Co-clustering Method with Intra-similarities , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[9]  Liang Chuan-wei Active Semi-supervised Text Clustering Based on Pairwise Constraints , 2011 .