Scalable information organization

We present three scalable extensions of the star algorithm for information organization that use sampling. The star algorithm organizes a document collection into clusters that are naturally induced by the topic structure of collection, via a computationally efficient cover by dense subgraphs. We also provide supporting data from extensive experiments.

[1]  Daniela Rus,et al.  A practical clustering algorithm for static and dynamic information organization , 1999, SODA '99.

[2]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[3]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[4]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[5]  W. Bruce Croft A model of cluster searching bases on classification , 1980, Inf. Syst..

[6]  Daniela Rus,et al.  Digital Digital Transportable Information Agents Transportable Information Agents , 1996 .

[7]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[8]  David Zuckerman,et al.  NP-complete problems have a version that's hard to approximate , 1993, [1993] Proceedings of the Eigth Annual Structure in Complexity Theory Conference.

[9]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[10]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[11]  Daniela Rus,et al.  Static and dynamic information organization with star clusters , 1998, CIKM '98.

[12]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[13]  Ellen M. Vdorhees The cluster hypothesis revisited , 1985, SIGIR 1985.

[14]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[15]  Ellen M. Vdorhees,et al.  The cluster hypothesis revisited , 1985, SIGIR '85.

[16]  Jan O. Pedersen,et al.  Almost-constant-time clustering of arbitrary corpus subsets4 , 1997, SIGIR '97.

[17]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.