Answering content and structure-based queries on XML documents using relevance propagation

As XML documents contain both content and structure information, taking advantage of the document structure in the retrieval process can lead to better identify relevant information units. In this paper, we describe an information retrieval (IR) approach dealing with queries composed of content and structure conditions. The XFIRM model we propose is designed to be as flexible as possible to process such queries. It is based on a complete query language, derived from XPath and on a relevance values propagation method. This paper aims at evaluating functions used in the propagation process, and particularly the use of distance between nodes as a parameter. The proposed method is evaluated, thanks to the INEX evaluation initiative. Results show a relative high precision of our proposal.

[1]  Foto N. Afrati,et al.  A Hypertext Model Supporting Query Mechanisms , 1992, ECHT.

[2]  Torsten Schlieder,et al.  Querying and ranking XML documents , 2002, J. Assoc. Inf. Sci. Technol..

[3]  M. de Rijke,et al.  An Element-based Approach to XML Retrieval , 2004 .

[4]  Mohand Boughanem,et al.  IRIT at INEX 2003 , 2003, INEX.

[5]  Benjamin Piwowarski,et al.  Bayesian Networks and INEX , 2002, INEX Workshop.

[6]  Paul Ogilvie,et al.  Using Language Models for Flat Text Queries in XML Retrieval , 2003 .

[7]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[8]  Karen Pinel-Sauvagnat XFIRM: un Modèle Flexible de Recherche d'Information pour le stockage et l'interrogation de documents XML , 2004, CORIA.

[9]  Shai Geva,et al.  XPath Inverted File for Information Retrieval , 2003 .

[10]  Norbert Fuhr,et al.  Applying the Divergence from Randomness Approach for Content-Only Search in XML Documents , 2004, ECIR.

[11]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[12]  Djoerd Hiemstra,et al.  The TIJAH XML-IR system at INEX 2003 , 2003, INEX.

[13]  Armin B. Cremers,et al.  Searching and browsing collections of structural information , 2000, Proceedings IEEE Advances in Digital Libraries 2000.

[14]  James A. Thom,et al.  RMIT INEX experiments: XML Retrieval using Lucy and eXist , 2003 .

[15]  M. Lalmas Dempster-Shafer ’ s Theory of Evidence applied to Documents : modelling Uncertainty , 1997 .

[16]  Michael Fuller,et al.  Structured answers for a large structured document collection , 1993, SIGIR.

[17]  Karen Sauvagnat XFIRM: un Modèle Flexible de Recherche d'Information pour le stockage et l'interrogation de documents XML. , 2004 .

[18]  Mounia Lalmas,et al.  Dempster-Shafer's theory of evidence applied to structured documents: modelling uncertainty , 1997, SIGIR '97.

[19]  Norbert Fuhr,et al.  Content-oriented XML retrieval with HyRex , 2002, INEX Workshop.

[20]  S. Boag,et al.  XQuery 1.0 : An XML query language, W3C Working Draft 12 November 2003 , 2003 .