Index structures for matching XML twigs using relational query processors

Various index structures have been proposed to speed up the evaluation of XML path expressions. However, existing XML path indices suffer from at least one of three limitations: they focus only on indexing the structure (relying on a separate index for node content), they are useful only for simple path expressions such as root-to-leaf paths, or they cannot be tightly integrated with a relational query processor. Moreover, there is no unified framework to compare these index structures. In this paper, we present a framework defining a family of index structures that includes most existing XML path indices. We also propose two novel index structures in this family, with different space-time tradeoffs, that are effective for the evaluation of XML branching path expressions (i.e., twigs) with value conditions. We also show how this family of index structures can be implemented using the access methods of the underlying relational database system. Finally, we present an experimental evaluation that shows the performance tradeoff between index space and matching time. The experimental results show that our novel indices achieve orders of magnitude improvement in performance for evaluating twig queries, albeit at a higher space cost, over the use of previously proposed XML path indices that can be tightly integrated with a relational query processor.

[1]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[2]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[3]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[4]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[5]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[6]  Bongki Moon,et al.  PRIX: indexing and querying XML using prufer sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[7]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[8]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[9]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[10]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[11]  Elisa Bertino,et al.  Indexing Techniques for Queries on Nested Objects , 1989, IEEE Trans. Knowl. Data Eng..

[12]  Alberto O. Mendelzon,et al.  Indexing XML Data with ToXin , 2001, WebDB.

[13]  David J. DeWitt,et al.  The Niagara Internet Query System , 2001, IEEE Data Eng. Bull..

[14]  Patrick Valduriez,et al.  Join indices , 1987, TODS.

[15]  Roy Goldman,et al.  From Semistructured Data to XML: Migrating the Lore Data Model and Query Language , 1999, Markup Lang..

[16]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[17]  Guido Moerkotte,et al.  Access support in object bases , 1990, SIGMOD '90.

[18]  Jiawei Han,et al.  Join Index Hierarchies for Supporting Efficient Navigations in Object-Oriented Databases , 1994, VLDB.

[19]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[20]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[21]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[22]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[23]  Cong Yu,et al.  TIMBER: A native XML database , 2002, The VLDB Journal.

[24]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.