XXL @ INEX 2003

Information retrieval on XML combines retrieval on content data (element and attribute values) with retrieval on structural data (element and attribute names). Standard query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. Such search conditions consist of regular path expressions including wildcards for paths of arbitrary length and boolean content conditions. We developed a flexible XML search language called XXL for probabilistic ranked retrieval on XML data. XXL offers a special operator ’∼’ for specifying semantic similarity search conditions on element names as well as element values. Ontological knowledge and appropriate index structures are necessary for semantic similarity search on XML data extracted from the Web, intranets or other document collections. The XXL Search Engine is a Java–based prototype implementation that support probabilistic ranked retrieval on a large corpus of XML data. This paper outlines the architecture of the XXL system and discusses its performance in the INEX benchmark.

[1]  Gerhard Weikum,et al.  Ontology-Enabled XML Search , 2003, Intelligent Search on XML Data.

[2]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[3]  C. Fellbaum An Electronic Lexical Database , 1998 .

[4]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[5]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[6]  Steven J. DeRose,et al.  Xml linking language (xlink), version 1. 0 , 2000, WWW 2000.

[7]  J. Davenport Editor , 1960 .

[8]  Torsten Grust,et al.  Tree Awareness for Relational DBMS Kernels: Staircase Join , 2003, Intelligent Search on XML Data.

[9]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Gerhard Weikum,et al.  Adding Relevance to XML , 2000, WebDB.

[12]  Sihem Amer-Yahia,et al.  Tree Pattern Relaxation , 2002, EDBT.

[13]  Gerhard Weikum,et al.  Intelligent Search on XML Data , 2003, Lecture Notes in Computer Science.

[14]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[15]  Matthias Jarke,et al.  Advances in Database Technology — EDBT 2002 , 2002, Lecture Notes in Computer Science.

[16]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[17]  David Orchard,et al.  XML Linking Language (XLink) , 2001 .

[18]  Gerhard Weikum,et al.  The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking , 2002, EDBT.