论文信息 - Analysis of Clustering Algorithms for Web-Based Search

Analysis of Clustering Algorithms for Web-Based Search

Automatic document categorization plays a key role in the development of future interfaces for Web-based search. Clustering algorithms are considered as a technology that is capable of mastering this "ad-hoc" categorization task.This paper presents results of a comprehensive analysis of clustering algorithms in connection with document categorization. The contributions relate to exemplar-based, hierarchical, and density-based clustering algorithms. In particular, we contrast ideal and real clustering settings and present runtime results that are based on efficient implementations of the investigated algorithms.

Benno Stein | Sven Meyer zu Eissen | S. M. Eissen | Benno Stein

[1] David D. Lewis,et al. Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .

[2] James C. Bezdek,et al. Cluster validation with generalized Dunn's indices , 1995, Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems.

[3] Peter Bruza,et al. Web searching: A process-oriented experimental study of three interactive search paradigms , 2002, J. Assoc. Inf. Sci. Technol..

[4] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[5] John R. Cowles,et al. Cluster Definition by the Optimization of Simple Measures , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Thomas Lengauer,et al. Combinatorial algorithms for integrated circuit layout , 1990, Applicable theory in computer science.

[7] Gerald Salton,et al. Automatic text processing , 1988 .

[8] Chinatsu Aone,et al. Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[9] Benno Stein,et al. On the Nature of Structure and Its Identification , 1999, WG.

[10] S. C. Johnson. Hierarchical clustering schemes , 1967, Psychometrika.

[11] George Karypis,et al. C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[12] K. Florek,et al. Sur la liaison et la division des points d'un ensemble fini , 1951 .

[13] Arne Frick,et al. Automatic Graph Clustering , 1996, GD.

[14] James C. Bezdek,et al. A geometric approach to cluster validity for normal mixtures , 1997, Soft Comput..

[15] P. Sneath. The application of computers to taxonomy. , 1957, Journal of general microbiology.

[16] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17] Arunabha Sen,et al. Graph Clustering Using Multiway Ratio Cut , 1997, GD.

[18] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[19] Gerald Kowalski,et al. Information Retrieval Systems: Theory and Implementation , 1997 .

[20] Richard M. Leahy,et al. An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[21] Pei-Yung Hsiao,et al. A Fuzzy Clustering Algorithm for Graph Bisection , 1994, Inf. Process. Lett..

[22] Teuvo Kohonen,et al. Self-organization and associative memory: 3rd edition , 1989 .

[23] Gerard Salton,et al. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .