Flexible XML Querying Using Skyline Semantics

Preferences over results of an XML query are of two distinct flavors. First, the user may prefer results which contain desired values, e.g., lower prices, favorite foods, higher ratings. Second, the user may prefer results with a certain structure, e.g., existence of a "discount" node, existence of an edge (and not only a path) between "departure" and "arrival" nodes. The first type of preference has been studied extensively over relational data, using skyline semantics, but has barely been considered for XML. The second type of preference has been studied for XML in the context of inexact querying, using scoring functions to rank results. This paper presents a query language for XML that incorporates both value-based and structural desires. Skyline semantics is used to determine optimal results. Algorithms for query evaluation under skyline semantics are presented and experimentation proves efficiency. The paper is novel in three aspects. First, it considers skyline querying over XML data values, and not over values in a relational database. Second, it presents a method for inexact querying of the structure of XML that is based on computing a skyline, instead of using scoring functions. Third, it combines both types of user preference into a single language. These facets join together to yield a versatile language for flexible querying of XML.

[1]  Yehoshua Sagiv,et al.  Flexible queries over semistructured data , 2001, PODS '01.

[2]  Jignesh M. Patel,et al.  Using histograms to estimate answer sizes for XML queries , 2003, Inf. Syst..

[3]  M. Tamer Özsu,et al.  XSEED: Accurate and Fast Cardinality Estimation for XPath Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[5]  Jignesh M. Patel,et al.  Efficient Skyline Computation over Low-Cardinality Domains , 2007, VLDB.

[6]  Yehoshua Sagiv,et al.  Combining Incompleteness and Ranking in Tree Queries , 2007, ICDT.

[7]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[8]  Jan Chomicki,et al.  Skyline with Presorting: Theory and Optimizations , 2005, Intelligent Information Systems.

[9]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[10]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[11]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[12]  Yehoshua Sagiv,et al.  Interconnection semantics for keyword search in XML , 2005, CIKM '05.

[13]  Wen-Chi Hou,et al.  Estimating XML Structural Join Size Quickly and Economically , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[14]  Peter J. Haas,et al.  Statistical Learning Techniques for Costing XML Queries , 2005, VLDB.

[15]  Werner Nutt,et al.  Querying Incomplete Information in Semistructured Data , 2002, J. Comput. Syst. Sci..

[16]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[17]  Sihem Amer-Yahia,et al.  Tree Pattern Relaxation , 2002, EDBT.

[18]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[19]  Jarek Gryz,et al.  Algorithms and analyses for maximal vector computation , 2007, The VLDB Journal.

[20]  Sara Cohen,et al.  Self-correcting queries for xml , 2007, CIKM '07.

[21]  Laks V. S. Lakshmanan,et al.  FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.

[22]  Laks V. S. Lakshmanan,et al.  On Testing Satisfiability of Tree Pattern Queries , 2004, VLDB.