论文信息 - A Comparative Study of Clustering versus Classification over Reuter's Collection

A Comparative Study of Clustering versus Classification over Reuter's Collection

People have plenty of information at their disposal. The problem is that, even with the advent of search engines, it is still complex to analyze, understand and select relevant information. In this sense, clustering techniques sound very promising, grouping related information in an organized way. This paper address some problems of the existing document clustering techniques and present the “best star” algorithm, which can be used to group and understand chunks of information and find the most relevant ones.

José Palazzo Moreira de Oliveira | Leandro Krug Wives | Stanley Loh

[1] José Palazzo Moreira de Oliveira,et al. Conceptual Clustering of Textual Documents and Some Insights for Knowledge Discovery , 2008 .

[2] David D. Lewis,et al. Representation and Learning in Information Retrieval , 1991 .

[3] Gerald Kowalski,et al. Information Retrieval Systems: Theory and Implementation , 1997 .

[4] David R. Brake. LOST IN CYBERSPACE , 1997 .

[5] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .

[6] David R. Karger,et al. Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[7] P. Sopp. Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[8] José Palazzo Moreira de Oliveira,et al. Text Mining in the Context of Business Intelligence , 2005, Encyclopedia of Information Science and Technology.

[9] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[10] Witold Pedrycz,et al. Fuzzy neural networks and neurocomputations , 1993 .

[11] Michalis Vazirgiannis,et al. Clustering validity checking methods: part II , 2002, SGMD.

[12] Yiming Yang,et al. An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[13] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[14] Ali F. Farhoomand,et al. Managerial information overload , 2002, CACM.