Static and dynamic information organization with star clusters

In this paper we present a system for static and dynamic information organization and show our evaluations of this system on TREC data. We introduce the off-line and on-line star clustering algorithms for information organization. Our evaluation experiments show that the offline star algorithm outperforms the single link and average link clustering algorithms. Since the star algorithm is also highly efficient and simple to implement, we advocate its use for tasks that require clustering, such as information organization, browsing, filtering, routing, topic tracking, and new topic detection.

[1]  Robert Burgin,et al.  The Retrieval Effectiveness of Five Clustering Algorithms as a Function of Indexing Exhaustivity , 1995, J. Am. Soc. Inf. Sci..

[2]  Karen Spärck Jones,et al.  The use of automatically-obtained keyword classifications for information retrieval , 1969, Inf. Storage Retr..

[3]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[4]  Tomás Feder,et al.  Optimal algorithms for approximate clustering , 1988, STOC '88.

[5]  Ellen M. Vdorhees The cluster hypothesis revisited , 1985, SIGIR 1985.

[6]  David B. Shmoys,et al.  A unified approach to approximation algorithms for bottleneck problems , 1986, JACM.

[7]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[8]  Robert Burgin The retrieval effectiveness of five clustering algorithms as a function of indexing exhaustivity , 1995 .

[9]  Ellen M. Vdorhees,et al.  The cluster hypothesis revisited , 1985, SIGIR '85.

[10]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[11]  W. M. Shaw On the Foundation of Evaluation. , 1986 .

[12]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[13]  V. Rich Personal communication , 1989, Nature.

[14]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[15]  Daniela Rus,et al.  Computing Dense Clusters On-line for Information Organization , 1997 .

[16]  David Zuckerman,et al.  NP-complete problems have a version that's hard to approximate , 1993, [1993] Proceedings of the Eigth Annual Structure in Complexity Theory Conference.

[17]  Béla Bollobás,et al.  Random Graphs , 1985 .

[18]  William M. Shaw,et al.  On the foundation of evaluation , 1986, J. Am. Soc. Inf. Sci..

[19]  Gerard Salton,et al.  The smart document retrieval project , 1991, SIGIR '91.

[20]  Guy Kortsarz,et al.  On choosing a dense subgraph , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[21]  Susan Brewer,et al.  Information storage and retrieval , 1959, ACM '59.

[22]  William M. Shaw,et al.  Controlled and Uncontrolled Subject Descriptions in the CF Database: A Comparison of Optimal Cluster-Based Retrieval Results , 1993, Inf. Process. Manag..

[23]  Fazli Can,et al.  Incremental clustering for dynamic information processing , 1993, TOIS.

[24]  James Allan,et al.  Automatic Hypertext Construction , 1995 .

[25]  M. Aldenderfer Cluster Analysis , 1984 .

[26]  Carsten Lund,et al.  On the hardness of approximating minimization problems , 1994, JACM.

[27]  E. Voorhees The Effectiveness & Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval , 1985 .

[28]  Philip G. Johnson Cornell University , 1897, The Journal of comparative medicine and veterinary archives.

[29]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[30]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[31]  W. Bruce Croft,et al.  Document clustering: An evaluation of some experiments with the cranfield 1400 collection , 1975, Inf. Process. Manag..

[32]  W. Bruce Croft A model of cluster searching bases on classification , 1980, Inf. Syst..

[33]  W. Bruce Croft Clustering large files of documents using the single-link method , 1977, J. Am. Soc. Inf. Sci..

[34]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[35]  Daniela Rus,et al.  Generating, Visualizing, and Evaluating High-Quality Clusters for Information Organization , 1998, PODDP.