Are two document clusters better than one? The Cluster Performance Question for information retrieval

When do information retrieval systems using two document clusters provide better retrieval performance than systems using no clustering? We answer this question for one set of assumptions and suggest how this may be studied with other assumptions. The “Cluster Hypothesis” asks an empirical question about the relationships between documents and user‐supplied relevance judgments, while the “Cluster Performance Question” proposed here focuses on the when and why of information retrieval or digital library performance for clustered and unclustered text databases. This may be generalized to study the relative performance of m versus n clusters.

[1]  Robert M. Losee When information retrieval measures agree about the relative quality of document rankings , 2000 .

[2]  Ellen M. Vdorhees,et al.  The cluster hypothesis revisited , 1985, SIGIR '85.

[3]  C. J. van Rijsbergen,et al.  Query-sensitive similarity measures for the calculation of interdocument relationships , 2001, CIKM '01.

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  Anton Leuski,et al.  Evaluating document clustering for interactive information retrieval , 2001, CIKM '01.

[6]  Robert M. Losee,et al.  Information retrieval with distributed databases: analytic models of performance , 2004, IEEE Transactions on Parallel and Distributed Systems.

[7]  Ellen M. Vdorhees The cluster hypothesis revisited , 1985, SIGIR 1985.

[8]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[9]  Robert M. Losee,et al.  The Science of Information: Measurement and Applications , 1990 .

[10]  Donald H. Kraft,et al.  A Bayesian approach to user stopping rules for information retrieval systems , 1981, Inf. Process. Manag..

[11]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[12]  Robert M. Losee,et al.  Seven fundamental questions for the science of library classification , 1993 .

[13]  Brian Everitt,et al.  Cluster analysis , 1974 .

[14]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[15]  Robert M. Losee Text retrieval and filtering: analytic models of performance , 1998 .

[16]  M. Aldenderfer Cluster Analysis , 1984 .