Evaluation in Context

All search happens in a particular context--such as the particular collection of a digital library, its associated search tasks, and its associated users. Information retrieval researchers usually agree on the importance of context, but they rarely address the issue. In particular, evaluation in the Cranfield tradition requires abstracting away from individual differences between users. This paper investigates if we can bring some of this context into the Cranfield paradigm. Our approach is the following: we will attempt to record the "context" of the humans already in the loop--the topic authors/assessors--by designing targeted questionnaires. The questionnaire data becomes part of the evaluation test-suite as valuable data on the context of the search requests.We have experimented with this questionnaire approach during the evaluation campaign of the INitiative for the Evaluation of XML Retrieval (INEX). The results of this case study demonstrate the viability of the questionnaire approach as a means to capture context in evaluation. This can help explain and control some of the user or topic variation in the test collection. Moreover, it allows to break down the set of topics in various meaningful categories, e.g. those that suit a particular task scenario, and zoom in on the relative performance for such a group of topics.

[1]  Peter Ingwersen,et al.  The Turn - Integration of Information Seeking and Retrieval in Context , 2005, The Kluwer International Series on Information Retrieval.

[2]  Mounia Lalmas,et al.  Best entry points for structured document retrieval - Part II: Types, usage and effectiveness , 2006, Inf. Process. Manag..

[3]  Paul Over,et al.  Blind Men and Elephants: Six Approaches to TREC data , 1999, Information Retrieval.

[4]  Mounia Lalmas,et al.  Best entry points for structured document retrieval - Part I: Characteristics , 2006, Inf. Process. Manag..

[5]  Jaap Kamps,et al.  Locating relevant text within XML documents , 2008, SIGIR '08.

[6]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[7]  Gabriella Kazai,et al.  INEX 2007 Evaluation Measures , 2008, INEX.

[8]  Tefko Saracevic,et al.  Digital Library Evaluation: Toward Evolution of Concepts , 2000, Libr. Trends.

[9]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..

[10]  Gabriella Kazai,et al.  The overlap problem in content-oriented XML retrieval evaluation , 2004, SIGIR '04.

[11]  Karen Spärck Jones What's the value of TREC: is there a gap to jump or a chasm to bridge? , 2006, SIGF.

[12]  Andrew Trotman,et al.  Focused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Schloss Dagstuhl, Germany , 2008 .

[13]  Andrew Trotman,et al.  Sound and complete relevance assessment for XML retrieval , 2008, TOIS.

[14]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[15]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[16]  James Allan,et al.  HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents , 2003, TREC.

[17]  Andrew Trotman,et al.  Overview of the INEX 2007 Ad Hoc Track , 2008, INEX.

[18]  Chris Buckley Why current IR engines fail , 2004, SIGIR '04.