On the efficient search of an XML twig query in large DataGuide trees

XML (Extensible Mark-up Language) has been embraced as a new approach to data modeling. Nowadays, more and more information is formatted as semi-structured data, e.g., articles in a digital library, documents on the web, and so on. Implementation of an efficient system enabling storage and querying of XML documents requires development of new techniques. Many different techniques of XML indexing have been proposed in recent years. In the case of XML data, we can distinguish the following trees: an XML tree, a tree of elements and attributes, and a DataGuide, a tree of element tags and attribute names. Obviously, the XML tree of an XML document is much larger than the DataGuide of a given document. Authors often consider DataGuide as a small tree. Therefore, they consider the DataGuide search as a small problem. However, we show that DataGuide trees are often massive in the case of real XML documents. Consequently, a trivial DataGuide search may be time and memory consuming. In this article, we introduce efficient methods for searching an XML twig pattern in large, complex DataGuide trees.

[1]  Torsten Grust,et al.  Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps , 2003, VLDB.

[2]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[3]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[4]  Sourav S. Bhowmick,et al.  SUCXENT: An Efficient Path-Based Approach to Store and Query XML Documents , 2004, DEXA.

[5]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[6]  Divesh Srivastava,et al.  Index Structures for Matching XML Twigs Using Relational Query Processors , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[7]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[8]  Daniela Florescu,et al.  Storing and Querying XML Data using an RDMBS , 1999, IEEE Data Eng. Bull..

[9]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.

[10]  Václav Snásel,et al.  Multidimensional term indexing for efficient processing of complex queries , 2004, Kybernetika.

[11]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[12]  Bongki Moon,et al.  PRIX: indexing and querying XML using prufer sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[13]  M. Tamer Özsu,et al.  A succinct physical storage scheme for efficient evaluation of path queries in XML , 2004, Proceedings. 20th International Conference on Data Engineering.

[14]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[15]  Roberto Grossi,et al.  A fully-dynamic data structure for external substring search , 1995, STOC '95.

[16]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[17]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[18]  Hua-Gang Li,et al.  Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents , 2006, VLDB.

[19]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[20]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[21]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[22]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[23]  Vassilis J. Tsotras,et al.  Tree-Pattern Queries on a Lightweight XML Processor , 2005, VLDB.

[24]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[25]  Václav Snásel,et al.  Implementation of XPath Axes in the Multi-dimensional Approach to Indexing XML Data , 2004, EDBT Workshops.

[26]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[27]  S. Boag,et al.  XQuery 1.0 : An XML query language, W3C Working Draft 12 November 2003 , 2003 .

[28]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[29]  Radim Bača,et al.  A cost-based join selection for XML twig content-based queries , 2008, DataX '08.

[30]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[31]  Jignesh M. Patel,et al.  Structural join order selection for XML query optimization , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[32]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[33]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .