Path Summaries and Path Partitioning in Modern XML Databases

XML path summaries are compact structures representing all the simple parent-child paths of an XML document. Such paths have also been used in many works as a basis for partitioning the document’s content in a persistent store, under the form of path indices or path tables. We revisit the notions of path summaries and path-driven storage model in the context of current-day XML databases. This context is characterized by complex queries, typically expressed in an XQuery subset, and by the presence of efficient encoding techniques such as structural node identifiers. We review a path summary’s many uses for query optimization, and given them a common basis, namely relevant paths. We discuss summary-based tree pattern minimization and present some efficient summary-based minimization heuristics. We consider relevant path computation and provide a time- and memory-efficient computation algorithm. We combine the principle of path partitioning with the presence of structural identifiers in a simple path-partitioned storage model, which allows for selective data access and efficient query plans. This model improves the efficiency of twig query processing up to two orders of magnitude over the similar tag-partitioned indexing model. We have implemented the path-partitioned storage model and path summaries in the XQueC compressed database prototype [8]. We present an experimental evaluation of a path summary’s practical feasibility and of tree pattern matching in a path-partitioned store.

[1]  Neoklis Polyzotis,et al.  Structure and Value Synopses for XML Data Graphs , 2002, VLDB.

[2]  Ioana Manolescu,et al.  XML Access Modules: Towards Physical Data Independence in XML Databases , 2005, XIME-P.

[3]  Neoklis Polyzotis,et al.  Statistical synopses for graph-structured XML databases , 2002, SIGMOD '02.

[4]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Jennifer Widom,et al.  Indexing Semistructured Data , 1998 .

[6]  Ioana Manolescu,et al.  Path Sequence-Based XML Query Processing , 2004, BDA.

[7]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[8]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[9]  Hao He,et al.  Multiresolution indexing of XML for frequent queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[10]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[11]  Laks V. S. Lakshmanan,et al.  On Testing Satisfiability of Tree Pattern Queries , 2004, VLDB.

[12]  Cong Yu,et al.  TIMBER: A native XML database , 2002, The VLDB Journal.

[13]  Dan Suciu,et al.  Containment and equivalence for an XPath fragment , 2002, PODS.

[14]  Jeffrey F. Naughton,et al.  Estimating the Selectivity of XML Path Expressions for Internet Scale Applications , 2001, VLDB.

[15]  Hamid Pirahesh,et al.  A Framework for Using Materialized XPath Views in XML Query Processing , 2004, VLDB.

[16]  Gianni Costa,et al.  XQueC: pushing queries to compressed XML data (demo) , 2003, VLDB 2003.

[17]  Daniela Florescu,et al.  Storing and Querying XML Data using an RDMBS , 1999, IEEE Data Eng. Bull..

[18]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[19]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[20]  Eugene J. Shekita,et al.  Querying XML Views of Relational Data , 2001, VLDB.

[21]  Sven Helmer,et al.  Anatomy of a native XML base management system , 2002, The VLDB Journal.

[22]  Z. Meral Özsoyoglu,et al.  Rewriting XPath Queries Using Materialized Views , 2005, VLDB.

[23]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[24]  David J. DeWitt,et al.  Mixed Mode XML Query Processing , 2003, VLDB.

[25]  Laks V. S. Lakshmanan,et al.  Tree logical classes for efficient evaluation of XQuery , 2004, SIGMOD '04.

[26]  Ioana Manolescu,et al.  Efficient Query Evaluation over Compressed XML Data , 2004, EDBT.

[27]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[28]  Georg Gottlob,et al.  The complexity of XPath query evaluation , 2003, PODS.

[29]  Laks V. S. Lakshmanan,et al.  Minimization of tree pattern queries , 2001, SIGMOD '01.

[30]  Peter Buneman,et al.  Edinburgh Research Explorer Path Queries on Compressed XML , 2022 .

[31]  Wenfei Fan,et al.  Vectorizing and querying large XML repositories , 2005, 21st International Conference on Data Engineering (ICDE'05).

[32]  Laks V. S. Lakshmanan,et al.  On the evaluation of tree pattern queries , 2006, ICSOFT.

[33]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[34]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[35]  Torsten Grust,et al.  Bridging the GAP Between Relational and Native XML Storage with Staircase Join , 2003, Grundlagen von Datenbanken.

[36]  Jeffrey D. Ullman,et al.  Representative objects: concise representations of semistructured, hierarchical data , 1997, Proceedings 13th International Conference on Data Engineering.

[37]  Alin Deutsch,et al.  The NEXT Logical Framework for XQuery , 2004, VLDB.

[38]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[39]  Beng Chin Ooi,et al.  A Statistical Approach for XML Query Size Estimation , 2004, EDBT Workshops.

[40]  Ioana Manolescu,et al.  A Test Platform for the INEX Heterogeneous Track , 2004, INEX.

[41]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[42]  Denilson Barbosa,et al.  The XML web: a first study , 2003, WWW '03.

[43]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[44]  Ronald Fagin,et al.  Multivalued dependencies and a new normal form for relational databases , 1977, TODS.

[45]  Yossi Matias,et al.  Fractional XSketch Synopses for XML Databases , 2004, XSym.

[46]  Denilson Barbosa,et al.  The Toronto XML Engine , 2001 .

[47]  Ioana Manolescu,et al.  ULoad: Choosing the Right Storage for Your XML Application , 2005, VLDB.

[48]  Alberto O. Mendelzon,et al.  Benefits of Path Summaries in an XML Query Optimizer Supporting Multiple Access Methods , 2005, VLDB.

[49]  Jan Hidders,et al.  Avoiding Unnecessary Ordering Operations in XPath , 2003, DBPL.

[50]  H. V. Jagadish,et al.  Pattern Tree Algebras: Sets or Sequences? , 2005, VLDB.