Tree-Pattern Queries on a Lightweight XML Processor

Popular XML languages, like XPath, use "tree-pattern" queries to select nodes based on their structural characteristics. While many processing methods have already been proposed for such queries, none of them has found its way to any of the existing "lightweight" XML engines (i.e. engines without optimization modules). The main reason is the lack of a systematic comparison of query methods under a common storage model. In this work, we aim to fill this gap and answer two important questions: what the relative similarities and important differences among the tree-pattern query methods are, and if there is a prominent method among them in terms of effectiveness and robustness that an XML processor should support. For the first question, we propose a novel classification of the methods according to their matching process. We then describe a common storage model and demonstrate that the access pattern of each class conforms or can be adapted to conform to this model. Finally, we perform an experimental evaluation to compare their relative performance. Based on the evaluation results, we conclude that the family of holistic processing methods, which provides performance guarantees, is the most robust alternative for such an environment.

[1]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.

[2]  Hao Zhang,et al.  Path sharing and predicate evaluation for high-performance XML filtering , 2003, TODS.

[3]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[4]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[5]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[6]  David J. DeWitt,et al.  Mixed Mode XML Query Processing , 2003, VLDB.

[7]  Tok Wang Ling,et al.  Efficient processing of XML twig patterns with parent child edges: a look-ahead approach , 2004, CIKM '04.

[8]  Sudarshan S. Chawathe,et al.  XPath queries on streaming data , 2003, SIGMOD '03.

[9]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[10]  Tova Milo,et al.  Optimizing queries on files , 1994, SIGMOD '94.

[11]  Dan Suciu,et al.  Processing XML Streams with Deterministic Automata , 2003, ICDT.

[12]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[13]  Bongki Moon,et al.  PRIX: indexing and querying XML using prufer sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[14]  M. Tamer Özsu,et al.  A succinct physical storage scheme for efficient evaluation of path queries in XML , 2004, Proceedings. 20th International Conference on Data Engineering.

[15]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[16]  Hartmut Liefke,et al.  Horizontal Query Optimization on Ordered Semistructured Data , 1999, WebDB.

[17]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[18]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[19]  Mong-Li Lee,et al.  An evaluation of XML indexes for structural join , 2004, SGMD.

[20]  Ioana Manolescu,et al.  The XML benchmark project , 2001 .

[21]  Xiaofeng Meng,et al.  On the sequencing of tree structures for XML indexing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[22]  Jignesh M. Patel,et al.  Structural join order selection for XML query optimization , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[23]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[24]  David Megginson,et al.  Simple API for XML , 1998 .

[25]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[26]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[27]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[28]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.