Evaluation by comparing result sets in context

Familiar evaluation methodologies for information retrieval (IR) are not well suited to the task of comparing systems in many real settings. These systems and evaluation methods must support contextual, interactive retrieval over changing, heterogeneous data collections, including private and confidential information.We have implemented a comparison tool which can be inserted into the natural IR process. It provides a familiar search interface, presents a small number of result sets in side-by-side panels, elicits searcher judgments, and logs interaction events. The tool permits study of real information needs as they occur, uses the documents actually available at the time of the search, and records judgments taking into account the instantaneous needs of the searcher.We have validated our proposed evaluation approach and explored potential biases by comparing different whole-of-Web search facilities using a Web-based version of the tool. In four experiments, one with supplied queries in the laboratory and three with real queries in the workplace, subjects showed no discernable left-right bias and were able to reliably distinguish between high- and low-quality result sets. We found that judgments were strongly predicted by simple implicit measures.Following validation we undertook a case study comparing two leading whole-of-Web search engines. The approach is now being used in several ongoing investigations.

[1]  Pia Borlund,et al.  The IIR evaluation model: a framework for evaluation of interactive information retrieval systems , 2003, Inf. Res..

[2]  Thorsten Joachims,et al.  Evaluating Retrieval Performance Using Clickthrough Data , 2003, Text Mining.

[3]  Nicholas J. Belkin,et al.  Display time as implicit feedback: understanding task effects , 2004, SIGIR '04.

[4]  Mark S. Ackerman,et al.  The perfect search engine is not enough: a study of orienteering behavior in directed search , 2004, CHI.

[5]  Peter Ingwersen,et al.  Measures of relative relevance and ranked half-life: performance indicators for interactive IR , 1998, SIGIR '98.

[6]  Susan T. Dumais,et al.  Evaluating implicit measures to improve the search experiences , 2003 .

[7]  Oren Etzioni,et al.  Multi-Service Search and Comparison Using the MetaCrawler , 1995 .

[8]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[9]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[10]  David Hawking,et al.  Context in Enterprise Search and Delivery , 2005 .

[11]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[12]  ChengXiang Zhai,et al.  Implicit user modeling for personalized search , 2005, CIKM '05.

[13]  Paul Over,et al.  The TREC-9 Interactive Track Report , 1999, TREC.

[14]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[15]  Ryen W. White,et al.  A study of factors affecting the utility of implicit relevance feedback , 2005, SIGIR '05.

[16]  Ragnar Nordlie,et al.  “User revealment”—a comparison of initial queries and ensuing question development in online searching and in human reference interactions , 1999, SIGIR '99.

[17]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[18]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[19]  Oren Etzioni,et al.  Multi-Engine Search and Comparison Using the MetaCrawler , 1995, World Wide Web J..

[20]  Jacques Savoy,et al.  Approaches to collection selection and results merging for distributed information retrieval , 2001, CIKM '01.

[21]  Micheline Hancock-Beaulieu,et al.  Evaluating the Impact of an Online Library Catalogue on subject Searching behaviour at the Catalogue and T the shelves , 1991, J. Documentation.

[22]  Peter Bailey,et al.  Measuring Search Engine Quality , 2001, Information Retrieval.

[23]  David Hawking,et al.  Result merging strategies for a current news metasearcher , 2003, Inf. Process. Manag..

[24]  Mark Claypool,et al.  Implicit interest indicators , 2001, IUI '01.

[25]  Pierre Hansen,et al.  The Information Seeking and Retrieval process at the Swedish Patent-and Registration Office , 2000, SIGIR 2000.