XML indexing and storage: fulfilling the wish list

XML Indexing and Storage (XMIS) techniques are crucial for the functionality and the overall performance of an XML database management system (XDBMS). Because of the complexity of XQuery and performance demands of XML query processing, efficient path processing operators—including those for tree-pattern queries (so-called twigs)—are urgently needed for which tailor-made indexes and their flexible use are indispensable. Although XML indexing and storage are standard problems and, of course, manifold approaches have been proposed in the last decade, adaptive and broad-enough solutions for satisfactory query evaluation support of all path processing operators are missing in the XDBMS context. Therefore, we think that it is worthwhile to take a step back and look at the complete picture to derive a salient and holistic solution. To do so, we first compile an XMIS wish list containing what—in our opinion—are essential functional storage and indexing requirements in a modern XDBMS. With these desiderata in mind, we then develop a new XMIS scheme, which—by reconsidering previous work—can be seen as a practical and general approach to XML storage and indexing. Interestingly, by working on both problems at the same time, we can make the storage and index managers live in a kind of symbiotic partnership, because the document store re-uses ideas originally proposed by the indexing community and vice versa. The XMIS scheme is implemented in XTC, an XDBMS used for empirical tests.

[1]  Christian Mathis,et al.  Comparison of Complete and Elementless Native Storage of XML Documents , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).

[2]  Torsten Grust,et al.  Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps , 2003, VLDB.

[3]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[4]  Theo Härder,et al.  An efficient infrastructure for native transactional XML processing , 2007, Data Knowl. Eng..

[5]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  Sourav S. Bhowmick,et al.  Efficient recursive XML query processing using relational database systems , 2006, Data Knowl. Eng..

[7]  Wolfgang Meier,et al.  eXist: An Open Source Native XML Database , 2002, Web, Web-Services, and Database Systems.

[8]  Ioana Manolescu,et al.  Path Summaries and Path Partitioning in Modern XML Databases , 2006, WWW '06.

[9]  Christian Mathis Storing, Indexing, and Querying XML Documents in Native XML Database Management Systems , 2009 .

[10]  M. Tamer Özsu,et al.  A succinct physical storage scheme for efficient evaluation of path queries in XML , 2004, Proceedings. 20th International Conference on Data Engineering.

[11]  Maxim N. Grinev,et al.  Sedna: A Native XML DBMS , 2006, SOFSEM.

[12]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[13]  Jeffrey F. Naughton,et al.  On the integration of structure indexes and inverted lists , 2004, Proceedings. 20th International Conference on Data Engineering.

[14]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[15]  Cong Yu,et al.  XQuery 1.0 and XPath 2.0 Full-Text , 2009, Encyclopedia of Database Systems.

[16]  Hongjun Lu,et al.  Efficient Processing of XML Path Queries Using the Disk-based F&B Index , 2005, VLDB.

[17]  Michael J. Franklin,et al.  Managing complex and varied data with the IndexFabric/sup TM/ , 2002, Proceedings 18th International Conference on Data Engineering.

[18]  P. Sreenivasa Kumar,et al.  Efficient indexing and querying of XML data using modified Prüfer sequences , 2005, CIKM '05.

[19]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[20]  Christian Mathis,et al.  Storing and indexing XML documents upside down , 2009, Computer Science - Research and Development.

[21]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[22]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[23]  Hamid Pirahesh,et al.  A Framework for Using Materialized XPath Views in XML Query Processing , 2004, VLDB.

[24]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[25]  Hamid Pirahesh,et al.  System RX: one part relational, one part XML , 2005, SIGMOD '05.

[26]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[27]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[28]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[29]  Goetz Graefe,et al.  B-tree indexes and CPU caches , 2001, Proceedings 17th International Conference on Data Engineering.

[30]  Theo Härder,et al.  DeweyIDs - The Key to Fine-Grained Management of XML Documents , 2010, SBBD.

[31]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[32]  Torsten Grust,et al.  MonetDB/XQuery: a fast XQuery processor powered by a relational engine , 2006, SIGMOD Conference.

[33]  Hua-Gang Li,et al.  FLUX: Content and Structure Matching of XPath Queries with Range Predicates , 2006, XSym.

[34]  Christian Mathis,et al.  Node labeling schemes for dynamic XML documents reconsidered , 2007, Data Knowl. Eng..