Sound ranking algorithms for XML search in PF/Tijah

We argue that ranking algorithms for XML should reflect the actual combined content and structure constraints of queries, while at the same time producing equal rankings for queries that are semantically equal. Ranking algorithms that produce different rankings for queries that are semantically equal are easily detected by tests on large databases: We call such algorithms not sound. We report the behaviour of different approaches to ranking contentand-structure queries on pairs of queries for which we expect equal ranking results from the query semantics. We show that most of these approaches are not sound. Of the remaining approaches, only 3 adhere to the W3C XQuery Full-Text standard.

[1]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[2]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[3]  Djoerd Hiemstra,et al.  Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[4]  Laks V. S. Lakshmanan,et al.  FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.

[5]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[6]  Mounia Lalmas,et al.  Evaluating XML retrieval effectiveness at INEX , 2007, SIGF.

[7]  Djoerd Hiemstra,et al.  Score region algebra: building a transparent XML-R database , 2005, CIKM '05.

[8]  Cong Yu,et al.  XQuery 1.0 and XPath 2.0 Full-Text , 2009, Encyclopedia of Database Systems.

[9]  Mounia Lalmas,et al.  Dempster-Shafer's theory of evidence applied to structured documents: modelling uncertainty , 1997, SIGIR '97.

[10]  Wessel Kraaij,et al.  Variations on language modeling for information retrieval , 2005, SIGF.

[11]  Djoerd Hiemstra,et al.  PFTijah: text search in an XML database system , 2006 .

[12]  Hans-Jörg Schek,et al.  Generating Vector Spaces On-the-fly for Flexible XML Retrieval , 2002 .

[13]  Vojkan Mihajlovic,et al.  Score region algebra : a flexible framework for structured information retrieval , 2006 .

[14]  Forbes J. Burkowski Retrieval activities in a database consisting of heterogeneous collections of structured text , 1992, SIGIR '92.

[15]  Sihem Amer-Yahia,et al.  Texquery: a full-text search extension to xquery , 2004, WWW '04.

[16]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[17]  Andrew Trotman,et al.  Narrowed Extended XPath I (NEXI) , 2004, INEX.