Controlling overlap in content-oriented XML retrieval

The direct application of standard ranking techniques to retrieve individual elements from a collection of XML documents often produces a result set in which the top ranks are dominated by a large number of elements taken from a small number of highly relevant documents. This paper presents and evaluates an algorithm that re-ranks this result set, with the aim of minimizing redundant content while preserving the benefits of element retrieval, including the benefit of identifying topic-focused components contained within relevant documents. The test collection developed by the INitiative for the Evaluation of XML Retrieval (INEX) forms the basis for the evaluation.

[1]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[2]  Charles L. A. Clarke,et al.  MultiText Experiments for INEX 2004 , 2004, INEX.

[3]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[4]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[5]  Gabriella Kazai,et al.  Tolerance to irrelevance: a user-effort oriented evaluation of retrieval systems without predefined retrieval unit , 2004 .

[6]  Gabriella Kazai,et al.  The overlap problem in content-oriented XML retrieval evaluation , 2004, SIGIR '04.

[7]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[8]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[9]  Andrew Trotman,et al.  NEXI, Now and Next , 2004, INEX.

[10]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[11]  Yosi Mass,et al.  Component Ranking and Automatic Query Refinement for XML Retrieval , 2004, INEX.

[12]  Maarten de Rijke,et al.  Length normalization in XML retrieval , 2004, SIGIR '04.

[13]  M. Tamer Özsu,et al.  A comprehensive XQuery to SQL translation using dynamic interval encoding , 2003, SIGMOD '03.

[14]  Seyed M. M. Tahaghoghi,et al.  Hybrid XML Retrieval Revisited , 2004, INEX.

[15]  Wesley W. Chu,et al.  Configurable indexing and ranking for XML information retrieval , 2004, SIGIR '04.

[16]  Gabriella Kazai,et al.  Reliability Tests for the XCG and inex-2002 Metrics , 2004, INEX.

[17]  Benjamin Piwowarski,et al.  An Algebra for Structured Queries in Bayesian Networks , 2004, INEX.

[18]  Mounia Lalmas,et al.  Providing consistent and exhaustive relevance assessments for XML retrieval evaluation , 2004, CIKM '04.

[19]  Jaana Kekäläinen,et al.  TRIX 2004 - Struggling with the Overlap , 2004, INEX.

[20]  James P. Callan,et al.  Hierarchical Language Models for XML Component Retrieval , 2004, INEX.