Extended structural relevance framework: a framework for evaluating structured document retrieval

A structured document retrieval (SDR) system aims to minimize the effort users spend to locate relevant information by retrieving parts of documents. To evaluate the range of SDR tasks, from element to passage to tree retrieval, numerous task-specific measures have been proposed. This has resulted in SDR evaluation measures that cannot easily be compared with respect to each other and across tasks. In previous work, we defined the SDR task of tree retrieval where passage and element are special cases. In this paper, we look in greater detail into tree retrieval to identify the main components of SDR evaluation: relevance, navigation, and redundancy. Our goal is to evaluate SDR within a single probabilistic framework based on these components. This framework, called Extended Structural Relevance (ESR), calculates user expected gain in relevant information depending on whether it is seen via hits (relevant results retrieved), unseen via misses (relevant results not retrieved), or possibly seen via near-misses (relevant results accessed via navigation). We use these expectations as parameters to formulate evaluation measures for tree retrieval. We then demonstrate how existing task-specific measures, if viewed as tree retrieval, can be formulated, computed and compared using our framework. Finally, we experimentally validate ESR across a range of SDR tasks.

[1]  Kalervo Järvelin,et al.  Evaluating the effectiveness of relevance feedback based on a user simulation model: effects of a user scenario on cumulated gain value , 2008, Information Retrieval.

[2]  Andrew Trotman,et al.  Report on the SIGIR 2006 workshop on XML element retrieval methodology , 2006, SIGF.

[3]  Mark D. Dunlop Time, relevance and interaction modelling for information retrieval , 1997, SIGIR '97.

[4]  Gabriella Kazai,et al.  Tolerance to irrelevance: a user-effort oriented evaluation of retrieval systems without predefined retrieval unit , 2004 .

[5]  Gabriella Kazai,et al.  Choosing an Ideal Recall-Base for the Evaluation of the Focused Task: Sensitivity Analysis of the XCG Evaluation Measures , 2006, INEX.

[6]  EFFORT-PRECISION AND GAIN-RECALL BASED ON A PROBABILISTIC NAVIGATION MODEL Integrating Post-Query Navigation within a Measure of Retrieval Effectiveness , 2007 .

[7]  Stephen E. Robertson,et al.  THE PARAMETRIC DESCRIPTION OF RETRIEVAL TESTS: PART I: THE BASIC PARAMETERS , 1969 .

[8]  Sihem Amer-Yahia,et al.  XQuery Full-Text extensions explained , 2006, IBM Syst. J..

[9]  Mariano P. Consens,et al.  AxPRE Summaries: Exploring the (Semi-)Structure of XML Web Collections , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Steve Fox,et al.  Evaluating implicit measures to improve web search , 2005, TOIS.

[11]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[12]  James A. Thom,et al.  Evaluating Focused Retrieval Tasks , 2007 .

[13]  Gabriella Kazai,et al.  Reliability Tests for the XCG and inex-2002 Metrics , 2004, INEX.

[14]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.

[15]  Enrico Motta,et al.  Using TREC for cross-comparison between classic IR and ontology-based search models at a Web scale , 2009 .

[16]  Mariano P. Consens,et al.  Structural Relevance in XML Retrieval Evaluation , 1989 .

[17]  Gabriella Kazai,et al.  INEX 2007 Evaluation Measures , 2008, INEX.

[18]  M. de Rijke,et al.  Mixture Models, Overlap, and Structural Hints in XML Element Retrieval , 2004, INEX.

[19]  P. Bollmann,et al.  Two axioms for evaluation measures in information retrieval , 1984, SIGIR 1984.

[20]  James A. Thom,et al.  HiXEval: Highlighting XML Retrieval Evaluation , 2005, INEX.

[21]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[22]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[23]  Gabriella Kazai,et al.  Structural relevance: a common basis for the evaluation of structured document retrieval , 2008, CIKM '08.

[24]  Benjamin Piwowarski EPRUM Metrics and INEX 2005 , 2005, INEX.

[25]  Benjamin Piwowarski,et al.  Precision recall with user modeling (PRUM): Application to structured information retrieval , 2007, TOIS.

[26]  Birger Larsen,et al.  Users, structured documents and overlap: interactive searching of elements and the influence of context on search behaviour , 2006, IIiX.

[27]  Andrew Trotman,et al.  Focused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Schloss Dagstuhl, Germany , 2008 .

[28]  Jaap Kamps,et al.  Evaluating relevant in context: document retrieval with a twist , 2007, SIGIR.

[29]  Andrew Trotman,et al.  Sound and complete relevance assessment for XML retrieval , 2008, TOIS.

[30]  Andrew Trotman,et al.  Narrowed Extended XPath I (NEXI) , 2004, INEX.

[31]  Evaggelia Pitoura,et al.  Search result diversification , 2010, SGMD.

[32]  Mariano P. Consens,et al.  Representing User Navigation in XML Retrieval with Structural Summaries , 2009, ECIR.

[33]  Justin Zobel,et al.  Redundant documents and search effectiveness , 2005, CIKM '05.

[34]  Amit P. Sheth,et al.  Ranking complex relationships on the semantic Web , 2005, IEEE Internet Computing.

[35]  W. S. Cooper Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems , 1968 .

[36]  Gabriella Kazai Initiative for the Evaluation of XML Retrieval , 2009 .

[37]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[38]  Gabriella Kazai,et al.  The overlap problem in content-oriented XML retrieval evaluation , 2004, SIGIR '04.

[39]  Klaus-Dieter Schewe,et al.  Fragmentation of XML Documents , 2010, J. Inf. Data Manag..

[40]  Vijay V. Raghavan,et al.  A critical investigation of recall and precision as measures of retrieval system performance , 1989, TOIS.

[41]  Andrew Trotman,et al.  Strict and vague interpretation of XML-retrieval queries , 2006, SIGIR.

[42]  Amit P. Sheth,et al.  SWETO: Large-Scale Semantic Web Test-bed , 2004 .

[43]  Gabriella Kazai,et al.  Focussed Structured Document Retrieval , 2002, SPIRE.

[44]  Gabriella Kazai,et al.  eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval , 2006, TOIS.

[45]  Benjamin Piwowarski,et al.  Expected Ratio of Relevant Units: A Measure for Structured Information Retrieval , 2008 .

[46]  Ellen M. Voorhees,et al.  Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.