Incremental clustering for dynamic information processing

Clustering of very large document databases is useful for both searching and browsing. The periodic updating of clusters is required due to the dynamic nature of databases. An algorithm for incremental clustering is introduced. The complexity and cost analysis of the algorithm together with an investigation of its expected behavior are presented. Through empirical testing it is shown that the algorithm achieves cost effectiveness and generates statistically valid clusters that are compatible with those of reclustering. The experimental evidence shows that the algorithm creates an effective and efficient retrieval environment.

[1]  Fazli Can,et al.  Dynamic cluster maintenance , 1989, Inf. Process. Manag..

[2]  Nicholas J. Belkin,et al.  Retrieval techniques , 1987 .

[3]  H. S. Heaps,et al.  Information retrieval, computational and theoretical aspects , 1978 .

[4]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[5]  Bert R. Boyce Online bibliographic databases: A directory and sourcebook , 1989 .

[6]  Jakob Nielsen,et al.  Hypertext and hypermedia , 1990 .

[7]  Jin M. Choi Online bibliographic databases: a directory and sourcebook (4th ed): J.L. Hall (ed.). Aslib, London (1986). xvii + 508 pp., $105. ISBN 0-8103-2080-0 , 1988 .

[8]  Fazli Can,et al.  Experiments on Incremental Clustering , 1991 .

[9]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[10]  Peter Willett,et al.  Comparison of Hierarchie Agglomerative Clustering Methods for Document Retrieval , 1989, Comput. J..

[11]  R. K. Wiersba Review of "Information Retrieval: Computational and Theoretical Aspects, by H. S. Heaps", Academic Press Inc. , 1980, SIGF.

[12]  Fazli Can,et al.  Incremental clustering for dynamic document databases , 1990, Proceedings of the 1990 Symposium on Applied Computing.

[13]  Fazli Can,et al.  Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases , 1990, TODS.

[14]  Christos Faloutsos,et al.  Access methods for text , 1985, CSUR.

[15]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[16]  Gerard Salton,et al.  Generation and search of clustered files , 1978, TODS.

[17]  Chris Buckley,et al.  Optimization of inverted vector searches , 1985, SIGIR '85.

[18]  C. J. van Rijsbergen The best-match problem in document retrieval , 1974, CACM.

[19]  E. Voorhees The Effectiveness & Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval , 1985 .

[20]  Fazli Can,et al.  A dynamic cluster maintenance system for information retrieval , 1987, SIGIR '87.

[21]  Ellen M. Voorhees,et al.  The efficiency of inverted index and cluster searches , 1986, SIGIR '86.

[22]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[23]  S. B. Yao,et al.  Approximating block accesses in database organizations , 1977, CACM.

[24]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[25]  Fazli Can On the Efficiency of Best-Match Cluster Searches , 1994, Inf. Process. Manag..

[26]  Clement T. Yu Adaptive document clustering , 1985, SIGIR '85.

[27]  Ellen M. Voorhees,et al.  Implementing agglomerative hierarchic clustering algorithms for use in document retrieval , 1986, Inf. Process. Manag..

[28]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .