ETH Zürich at INEX: Flexible Information Retrieval from XML with PowerDB-XML

When searching for relevant information in XML documents, users want to exploit the document structure when posing their queries. Therefore, queries over XML documents dynamically restrict the context of interest to arbitrary combinations of XML element types. State-of-the-art information retrieval (IR) however derives statistics such as document frequencies for the collection as a whole. With contexts of interest defined dynamically by user queries, this may lead to inconsistent rankings with XML documents that have heterogeneous content from different domains. To guarantee consistent retrieval, our XML engine PowerDBXML derives the appropriate IR statistics that consistently reflect the scope of interest defined by the user query onthe-fly, i.e., at query runtime. To compute the dynamic IR statistics efficiently, our implementation relies on underlying basic indexes and statistics data. This paper reports on our experiences from participating in INEX, the INitiative for the Evaluation of XML retrieval.

[1]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[2]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[3]  N. Fuhr PAN-Uncovering Plagiarism , Authorship , and Social Software Misuse ImageCLEF 2013-Cross Language Image Annotation and Retrieval INEX-INitiative for the Evaluation of XML retrieval , 2002 .

[4]  Norbert Fuhr,et al.  DOLORES: a system for logic-based retrieval of multimedia objects , 1998, SIGIR '98.

[5]  Hans-Jörg Schek,et al.  Generating Vector Spaces On-the-fly for Flexible XML Retrieval , 2002 .

[6]  Djoerd Hiemstra,et al.  Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term , 2002, SIGIR '02.

[7]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[8]  Gerhard Weikum,et al.  The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking , 2002, EDBT.

[9]  Hans-Jörg Schek,et al.  PowerDB-IR: information retrieval on top of a database cluster , 2001, CIKM '01.

[10]  Ioana Manolescu,et al.  Integrating Keyword Search into XML Query Processing , 2000, BDA.

[11]  Norbert Fuhr,et al.  Query Formulation and Result Visualization for XML Retrieval , 2002 .

[12]  Hans-Jörg Schek,et al.  PowerDB-IR – Scalable Information Retrieval and Storage with a Cluster of Databases , 2004, Knowledge and Information Systems.

[13]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[14]  Daniela Florescu,et al.  Storing and Querying XML Data using an RDMBS , 1999, IEEE Data Eng. Bull..