Do batch and user evaluations give the same results?

Do improvements in system performance demonstrated by batch evaluations confer the same benefit for real users? We carried out experiments designed to investigate this question. After identifying a weighting scheme that gave maximum improvement over the baseline in a non-interactive evaluation, we used it with real users searching on an instance recall task. Our results showed the weighting scheme giving beneficial results in batch studies did not do so with real users. Further analysis did identify other factors predictive of instance recall, including number of documents saved by the user, document recall, and number of documents seen by the user.

[1]  Donna K. Harman,et al.  Overview of the first TREC conference , 1993, SIGIR.

[2]  Don R. Swanson,et al.  Information Retrieval as a Trial-And-Error Process , 1977, The Library Quarterly.

[3]  Paul Over,et al.  Comparing interactive information retrieval systems across sites: the TREC-6 interactive track matrix experiment , 1998, SIGIR '98.

[4]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[5]  Chris Buckley,et al.  Pivoted Document Length Normalization , 1996, SIGIR Forum.

[6]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[7]  Donna Harman,et al.  Overview of the First Text REtrieval Conference. , 1993, SIGIR 1993.

[8]  Kent L. Norman,et al.  Development of an instrument measuring user satisfaction of the human-computer interface , 1988, CHI '88.

[9]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[10]  William R. Hersh,et al.  Relevance and Retrieval Evaluation: Perspectives from Medicine , 1994, J. Am. Soc. Inf. Sci..

[11]  Paul Over,et al.  TREC-7 Interactive Track Report , 1998, TREC.

[12]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[13]  Michael Keen,et al.  ASLIB CRANFIELD RESEARCH PROJECT FACTORS DETERMINING THE PERFORMANCE OF INDEXING SYSTEMS VOLUME 2 , 1966 .

[14]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .