Containment of partially specified tree-pattern queries in the presence of dimension graphs

Nowadays, huge volumes of data are organized or exported in tree-structured form. Querying capabilities are provided through tree-pattern queries. The need for querying tree-structured data sources when their structure is not fully known, and the need to integrate multiple data sources with different tree structures have driven, recently, the suggestion of query languages that relax the complete specification of a tree pattern. In this paper, we consider a query language that allows the partial specification of a tree pattern. Queries in this language range from structureless keyword-based queries to completely specified tree patterns. To support the evaluation of partially specified queries, we use semantically rich constructs, called dimension graphs, which abstract structural information of the tree-structured data. We address the problem of query containment in the presence of dimension graphs and we provide necessary and sufficient conditions for query containment. As checking query containment can be expensive, we suggest two heuristic approaches for query containment in the presence of dimension graphs. Our approaches are based on extracting structural information from the dimension graph that can be added to the queries while preserving equivalence with respect to the dimension graph. We considered both cases: extracting and storing different types of structural information in advance, and extracting information on-the-fly (at query time). Both approaches are implemented, validated, and compared through experimental evaluation.

[1]  Laks V. S. Lakshmanan,et al.  Answering tree pattern queries using views , 2006, VLDB.

[2]  Timos K. Sellis,et al.  Heuristic containment check of partial tree-pattern queries in the presence of index graphs , 2006, CIKM '06.

[3]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[4]  Michael Benedikt,et al.  XML Subtree Queries: Specification and Composition , 2005, DBPL.

[5]  Prakash V. Ramanan,et al.  Efficient algorithms for minimizing tree pattern queries , 2002, SIGMOD '02.

[6]  Menzo Windhouwer,et al.  Querying XML documents made easy: nearest concept queries , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Tova Milo,et al.  Views in a large-scale XML repository , 2002, The VLDB Journal.

[8]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[9]  Peter T. Wood Minimising Simple XPath Expressions , 2001, WebDB.

[10]  Jan Hidders Satisfiability of XPath Expressions , 2003, DBPL.

[11]  Alberto O. Mendelzon,et al.  Benefits of Path Summaries in an XML Query Optimizer Supporting Multiple Access Methods , 2005, VLDB.

[12]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[13]  Elke A. Rundensteiner,et al.  XQuery containment in presence of variable binding dependencies , 2005, WWW '05.

[14]  Thomas Schwentick,et al.  XPath Containment in the Presence of Disjunction, DTDs, and Variables , 2003, ICDT.

[15]  Laks V. S. Lakshmanan,et al.  Minimization of tree pattern queries , 2001, SIGMOD '01.

[16]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[17]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[18]  Neoklis Polyzotis,et al.  Approximate XML query answers , 2004, SIGMOD '04.

[19]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[20]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[21]  Sihem Amer-Yahia,et al.  Tree Pattern Relaxation , 2002, EDBT.

[22]  Peter T. Wood,et al.  Containment for XPath Fragments under DTD Constraints , 2003, ICDT.

[23]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[24]  Laks V. S. Lakshmanan,et al.  On Testing Satisfiability of Tree Pattern Queries , 2004, VLDB.

[25]  Sudipto Guha,et al.  Approximate XML joins , 2002, SIGMOD '02.

[26]  Alon Y. Halevy,et al.  Containment of Nested XML Queries , 2004, VLDB.

[27]  Dan Suciu,et al.  Containment and equivalence for an XPath fragment , 2002, PODS.

[28]  Neoklis Polyzotis,et al.  Statistical synopses for graph-structured XML databases , 2002, SIGMOD '02.

[29]  Alin Deutsch,et al.  Containment and Integrity Constraints for XPath , 2001, KRDB.

[30]  Yehoshua Sagiv,et al.  Flexible queries over semistructured data , 2001, PODS '01.

[31]  Yannis Papakonstantinou,et al.  Query rewriting for semistructured data , 1999, SIGMOD '99.

[32]  Laks V. S. Lakshmanan,et al.  FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.

[33]  Ioana Manolescu,et al.  Integrating Keyword Search into XML Query Processing , 2000, BDA.

[34]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[35]  Narain H. Gehani,et al.  Semantic querying of tree-structured data sources using partially specified tree patterns , 2005, CIKM '05.

[36]  Timos K. Sellis,et al.  Containment of Partially Specified Tree-Pattern Queries , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).