Document Clustering Description Extraction and Its Application

Document clustering description is a problem of labeling the clustering results of document collection clustering. It can help users determine whether one of the clusters is relevant to their information requirements or not. To resolve the problem of the weak readability of document clustering results, a method of automatic labeling document clusters based on machine learning is put forward. Clustering description extraction in application to topic digital library construction is introduced firstly. Then, the descriptive results of five models are analyzed respectively, and their performances are compared.

[1]  Mohamed S. Kamel,et al.  Topic Discovery from Text Using Aggregation of Different Clustering Methods , 2002, Canadian Conference on AI.

[2]  Yuen-Hsien Tseng,et al.  Toward Generic Title Generation for Clustered Documents , 2006, AIRS.

[3]  Kuei-Kuei Lai,et al.  Using the patent co-citation approach to establish a new patent classification system , 2005, Inf. Process. Manag..

[4]  Arnold L. Rosenberg,et al.  Finding topic words for hierarchical summarization , 2001, SIGIR '01.

[5]  Shourya Roy,et al.  A hierarchical monothetic document clustering algorithm for summarization and browsing search results , 2004, WWW '04.

[6]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[7]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[8]  W. Bruce Croft,et al.  An Evaluation of Techniques for Clustering Search Results , 2005 .

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  David M. Pennock,et al.  Inferring hierarchical descriptions , 2002, CIKM '02.

[11]  Bart De Moor,et al.  Combining full text and bibliometric information in mapping scientific disciplines , 2005, Inf. Process. Manag..

[12]  A. Muller,et al.  The TaxGen framework: automating the generation of a taxonomy for a large document collection , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[13]  James P. Callan,et al.  Automatically labeling hierarchical clusters , 2006, DG.O.