Demythization of Structural XML Query Processing: Comparison of Holistic and Binary Approaches, Technical Report

XML query can be modeled by twig pattern query (TPQ) specifying predicates on XML nodes and XPath relationships satisfied between them. A lot of TPQ types have been proposed; this paper takes into account a TPQ model extended by a specification of output and non-output query nodes since it complies with the XQuery semantics and, in many cases, it leads to a more efficient query processing. In general, there are two approaches to process the TPQ: holistic joins and binary joins. Whereas the binary join approach builds a query plan as a tree of interconnected binary operators, the holistic join approach evaluates a whole query using one operator (i.e., using one complex algorithm). Surprisingly, a thorough analytical and experimental comparison is still missing despite an enormous research effort in this area. In this paper, we try to fill this gap; we analytically and experimentally show that the binary joins used in a fully-pipelined plan (i.e., the plan where each join operation does not wait for the complete result of the previous operation and no explicit sorting is used) can often outperform the holistic joins, especially for TPQs with a higher ratio of non-output query nodes. The main contributions of this paper can be summarized as follows: (i) we introduce several improvements of existing binary join approaches allowing to build a fully-pipelined plan for a TPQ considering non-output query nodes, (ii) we prove that for a certain class of TPQs such a plan has the linear time complexity with respect to the size of the input and output as well as the linear space complexity with respect to the XML document depth (i.e., the same complexity as the holistic join approaches), (iii) we show that our improved binary join approach outperforms the holistic join approaches in many situations, and (iv) we propose a simple combined approach that uses advantages of both types of approaches.

[1]  Theo Härder,et al.  Using Structural Joins and Holistic Twig Joins for Native XML Query Optimization , 2009, ADBIS.

[2]  Petr Lukás,et al.  Cooking Lightweight XML Query Processor with Binary Joins and Comparing it with Holistic Joins: Technical Report , 2017, ArXiv.

[3]  Wen-Chi Hou,et al.  A sampling approach for XML query selectivity estimation , 2009, EDBT '09.

[4]  Michal Krátký,et al.  On the Efficiency of a Prefix Path Holistic Algorithm , 2009, XSym.

[5]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[6]  Irena Holubová,et al.  Structural XML Query Processing , 2017, ACM Comput. Surv..

[7]  Tok Wang Ling,et al.  From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching , 2005, VLDB.

[8]  Philippe Michiels,et al.  Optimizing Sorting and Duplicate Elimination in XQuery Path Expressions , 2005, Bull. EATCS.

[9]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[10]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[12]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[13]  H. V. Jagadish,et al.  Multi-level operator combination in XML query processing , 2002, CIKM '02.

[14]  Hua-Gang Li,et al.  Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents , 2006, VLDB.

[15]  Oded Shmueli,et al.  Multi-Core Processing of XML Twig Patterns , 2015, IEEE Transactions on Knowledge and Data Engineering.

[16]  Laks V. S. Lakshmanan,et al.  Tree logical classes for efficient evaluation of XQuery , 2004, SIGMOD '04.

[17]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.

[18]  Sven Helmer,et al.  Full-fledged algebraic XPath processing in Natix , 2005, 21st International Conference on Data Engineering (ICDE'05).

[19]  Jérôme Siméon,et al.  Put a Tree Pattern in Your Algebra , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[20]  Jeffrey Xu Yu,et al.  TwigList : Make Twig Pattern Matching Fast , 2007, DASFAA.

[21]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[22]  Christian Mathis,et al.  Hash-Based Structural Join Algorithms , 2006, EDBT Workshops.

[23]  Michal Krátký,et al.  Cost-based holistic twig joins , 2015, Inf. Syst..

[24]  Dan Luo,et al.  XML Multi-core Query Optimization Based on Task Preemption and Data Partition , 2013, JIST.

[25]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[26]  Vasilis Vassalos,et al.  Efficient physical operators for cost-based XPath execution , 2010, EDBT '10.

[27]  Tok Wang Ling,et al.  TP+Output: Modeling Complex Output Information in XML Twig Pattern Query , 2010, XSym.

[28]  Jignesh M. Patel,et al.  Structural join order selection for XML query optimization , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[29]  Truls Amundsen Bjørklund,et al.  Fast optimal twig joins , 2010, Proc. VLDB Endow..

[30]  Zhiwei Xu,et al.  Structural Semi-Join: A light-weight structural join operator for efficient XML path query pattern matching , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).

[31]  Tok Wang Ling,et al.  Optimal and efficient generalized twig pattern processing: a combination of preorder and postorder filterings , 2012, The VLDB Journal.

[32]  Tok Wang Ling,et al.  Efficient processing of XML twig patterns with parent child edges: a look-ahead approach , 2004, CIKM '04.

[33]  Chen Wang,et al.  Extended XML Tree Pattern Matching: Theories and Algorithms , 2011, IEEE Transactions on Knowledge and Data Engineering.

[34]  Umeshwar Dayal,et al.  Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers , 1987, VLDB.