论文信息 - Almost-constant-time clustering of arbitrary corpus subsets4

Almost-constant-time clustering of arbitrary corpus subsets4

Methods exist for comtant-time clustering of corpus subsets selected via Scatter/Gather browsing [3]. In thii paper we expand on those techniqum, giving an algorithm for alrnostconstant-time clustering of arbitrary corpus subsets. This algorithm is never slower than clustering the document set from scratch, and for medium-sised and large sets it is significantly faster. ThE algorithm ia USSM for clustering arbitrary subsets of large corpora — obtained, for instance, by a boolean search — quickly enough to be useful in an interactive setting.

Jan O. Pedersen | Craig Silverstein

[1] David R. Karger,et al. Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[2] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[3] Donna Harman,et al. The fourth text REtrieval conference , 1996 .

[4] David R. Karger,et al. Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[5] Jan O. Pedersen,et al. An object-oriented architecture for text retrieval , 1991, RIAO.

[6] Marti A. Hearst,et al. Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.