Almost-constant-time clustering of arbitrary corpus subsets4

Methods exist for comtant-time clustering of corpus subsets selected via Scatter/Gather browsing [3]. In thii paper we expand on those techniqum, giving an algorithm for alrnostconstant-time clustering of arbitrary corpus subsets. This algorithm is never slower than clustering the document set from scratch, and for medium-sised and large sets it is significantly faster. ThE algorithm ia USSM for clustering arbitrary subsets of large corpora — obtained, for instance, by a boolean search — quickly enough to be useful in an interactive setting.