Structural relevance: a common basis for the evaluation of structured document retrieval

This paper presents a unified framework for the evaluation of a range of structured document retrieval (SDR) approaches and tasks. The framework is based on a model of tree retrieval, evaluated using a novel extension of the Structural elevance (SR) measure. The measure replaces the assumption of independence in traditional information retrieval (IR) with a notion of redundancy that takes into account the user navigation inside documents while seeking relevant information. Unlike existing metrics for SDR, our proposed framework does not require the computation of an ideal ranking which has, thus far, prevented the practical application of such measures. Instead, SR builds on a Markovian model of user navigation that can be estimated through the use of structural summaries. The results of this paper (supported by experimental validation using INEX data) show that SR defined over a tree retrieval model can provide a common basis for the evaluation of SDR approaches across various structured search tasks.

[1]  Ricardo A. Baeza-Yates,et al.  Integrating contents and structure in text retrieval , 1996, SGMD.

[2]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[3]  Sheldon M. Ross,et al.  Introduction to Probability Models, Eighth Edition , 1972 .

[4]  Gabriella Kazai,et al.  Focussed Structured Document Retrieval , 2002, SPIRE.

[5]  Gabriella Kazai,et al.  eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval , 2006, TOIS.

[6]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[7]  Antoine Doucet,et al.  Accurate Retrieval of XML Document Fragments using EXTIRP , 2002 .

[8]  Andrew Trotman,et al.  Focused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Schloss Dagstuhl, Germany , 2008 .

[9]  Sheldon M. Ross,et al.  Introduction to probability models , 1975 .

[10]  Mariano P. Consens,et al.  AxPRE Summaries: Exploring the (Semi-)Structure of XML Web Collections , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Mariano P. Consens,et al.  Structural Relevance in XML Retrieval Evaluation , 1989 .

[12]  Cong Yu,et al.  XQuery 1.0 and XPath 2.0 Full-Text , 2009, Encyclopedia of Database Systems.

[13]  Andrew Trotman,et al.  Sound and complete relevance assessment for XML retrieval , 2008, TOIS.

[14]  Andrew Trotman,et al.  Wanted : Element Retrieval Users , 2005 .

[15]  Gabriella Kazai,et al.  Choosing an Ideal Recall-Base for the Evaluation of the Focused Task: Sensitivity Analysis of the XCG Evaluation Measures , 2006, INEX.

[16]  EFFORT-PRECISION AND GAIN-RECALL BASED ON A PROBABILISTIC NAVIGATION MODEL Integrating Post-Query Navigation within a Measure of Retrieval Effectiveness , 2007 .

[17]  Benjamin Piwowarski,et al.  Precision recall with user modeling (PRUM): Application to structured information retrieval , 2007, TOIS.

[18]  Benjamin Piwowarski,et al.  Measurement, Theory , 2022 .

[19]  Charles L. A. Clarke,et al.  Controlling overlap in content-oriented XML retrieval , 2005, SIGIR '05.

[20]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[21]  Vijay V. Raghavan,et al.  A critical investigation of recall and precision as measures of retrieval system performance , 1989, TOIS.

[22]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[23]  Andrew Trotman,et al.  Focused Access to XML Documents, 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers , 2008, INEX.

[24]  Lin Guo XRANK : Ranked Keyword Search over XML Documents , 2003 .

[25]  Gerhard Weikum,et al.  The XXL search engine: ranked retrieval of XML data using indexes and ontologies , 2002, SIGMOD '02.