Key Concepts for Native XML Processing

Over the recent five years, we have designed, implemented, and optimized our prototype system XTC, a native XDBMS providing multi-user read/write transactions and supporting multi-lingual query interfaces (XQuery, XPath, DOM, SAX). We have compared competing concepts in various system layers and iteratively found salient solutions which drastically improved the overall XDBMS performance. XML query processing is critically affected by the smooth interplay of concepts and methods. Here, we focus on the physical level of XML processing: node labeling and mapping options for storage structures; design of suitable index mechanisms; enriched functionality of path processing operators, in particular, for holistic twig joins. In this survey, we outline our experiences gained during the evolution of XTC. We develop "key concepts" to enable fine-grained, effective, and efficient XML processing.

[1]  Erhard Rahm,et al.  Web, Web-Services, and Database Systems , 2003, Lecture Notes in Computer Science.

[2]  Jeffrey Xu Yu,et al.  TwigList : Make Twig Pattern Matching Fast , 2007, DASFAA.

[3]  Tok Wang Ling,et al.  TwigStackList-: A Holistic Twig Join Algorithm for Twig Query with Not-Predicates on XML Data , 2006, DASFAA.

[4]  Christian Mathis,et al.  Comparison of Complete and Elementless Native Storage of XML Documents , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).

[5]  Christian Mathis,et al.  Essential Performance Drivers in Native XML DBMSs , 2010, SOFSEM.

[6]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[7]  Xiaofeng Meng,et al.  OrientStore: A Schema Based Native XML Storage System , 2003, VLDB.

[8]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[9]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[10]  Bernhard Rumpe,et al.  SOFSEM 2010: Theory and Practice of Computer Science, 36th Conference on Current Trends in Theory and Practice of Computer Science, Spindleruv Mlýn, Czech Republic, January 23-29, 2010. Proceedings , 2010, SOFSEM.

[11]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[12]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.

[13]  Torsten Grust,et al.  MonetDB/XQuery: a fast XQuery processor powered by a relational engine , 2006, SIGMOD Conference.

[14]  Hua-Gang Li,et al.  FLUX: Content and Structure Matching of XPath Queries with Range Predicates , 2006, XSym.

[15]  Hongjun Lu,et al.  Path Materialization Revisited: An Efficient Storage Model for XML Data , 2002, Australasian Database Conference.

[16]  Christian Mathis,et al.  Node labeling schemes for dynamic XML documents reconsidered , 2007, Data Knowl. Eng..

[17]  Sven Helmer,et al.  Anatomy of a native XML base management system , 2002, The VLDB Journal.

[18]  Wolfgang Meier,et al.  eXist: An Open Source Native XML Database , 2002, Web, Web-Services, and Database Systems.

[19]  Veda C. Storey,et al.  Conceptual Modeling — ER 2000 , 2003, Lecture Notes in Computer Science.

[20]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[21]  Cong Yu,et al.  TIMBER: A native XML database , 2002, The VLDB Journal.

[22]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[23]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[24]  Christian Mathis Storing, Indexing, and Querying XML Documents in Native XML Database Management Systems , 2009 .

[25]  Matthias Nicola,et al.  DB2 goes hybrid: Integrating native XML and XQuery with relational data and SQL , 2006, IBM Syst. J..

[26]  Malcolm P. Atkinson,et al.  Issues Raised by Three Years of Developing PJama: An Orthogonally Persistent Platform for Java , 1999, ICDT.

[27]  Tok Wang Ling,et al.  Efficient updates in dynamic XML data: from binary string to quaternary string , 2008, The VLDB Journal.

[28]  Dan Suciu,et al.  Database and XML Technologies , 2004, Lecture Notes in Computer Science.

[29]  Sourav S. Bhowmick,et al.  Efficient recursive XML query processing using relational database systems , 2006, Data Knowl. Eng..

[30]  Marcus Fontoura,et al.  Optimizing cursor movement in holistic twig joins , 2005, CIKM '05.

[31]  S-W Lee,et al.  Biologically Motivated Computer Vision , 2000, Lecture Notes in Computer Science.

[32]  Sihem Amer-Yahia,et al.  A comprehensive solution to the XML-to-relational mapping problem , 2004, WIDM '04.

[33]  Matthias Nicola,et al.  Index Challenges in Native XML Database Systems , 2009, BTW.

[34]  Theo Härder,et al.  An efficient infrastructure for native transactional XML processing , 2007, Data Knowl. Eng..

[35]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[36]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[37]  Hongjun Lu,et al.  Efficient Processing of XML Path Queries Using the Disk-based F&B Index , 2005, VLDB.

[38]  Bongki Moon,et al.  PRIX: indexing and querying XML using prufer sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[39]  M. Tamer Özsu,et al.  A succinct physical storage scheme for efficient evaluation of path queries in XML , 2004, Proceedings. 20th International Conference on Data Engineering.

[40]  Karsten Schmidt,et al.  On the use of query-driven XML auto-indexing , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[41]  Maxim N. Grinev,et al.  Sedna: A Native XML DBMS , 2006, SOFSEM.

[42]  Tok Wang Ling,et al.  PathStack : A Holistic Path Join Algorithm for Path Query with Not-Predicates on XML Data , 2005, DASFAA.

[43]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[44]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[45]  Joonho Kwon,et al.  FiST: Scalable XML Document Filtering by Sequencing Twig Patterns , 2005, VLDB.

[46]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[47]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[48]  Carmem S. Hara,et al.  RRXF: Redundancy reducing XML storage in relations , 2003, VLDB.

[49]  Hamid Pirahesh,et al.  System RX: one part relational, one part XML , 2005, SIGMOD '05.

[50]  Dongwon Lee,et al.  Constraints-Preserving Transformation from XML Document Type Definition to Relational Schema , 2000, ER.

[51]  M. Tamer Özsu,et al.  A comprehensive XQuery to SQL translation using dynamic interval encoding , 2003, SIGMOD '03.

[52]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[53]  Norman May,et al.  Index vs. Navigation in XPath Evaluation , 2006, XSym.

[54]  Jeffrey F. Naughton,et al.  On the integration of structure indexes and inverted lists , 2004, Proceedings. 20th International Conference on Data Engineering.

[55]  Karsten Schmidt,et al.  Usage-driven storage structures for native XML databases , 2008, IDEAS '08.

[56]  Tok Wang Ling,et al.  Efficient processing of XML twig patterns with parent child edges: a look-ahead approach , 2004, CIKM '04.

[57]  Hao He,et al.  Multiresolution indexing of XML for frequent queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[58]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.

[59]  Hua-Gang Li,et al.  Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents , 2006, VLDB.

[60]  Tok Wang Ling,et al.  TJFast: effective processing of XML twig pattern matching , 2005, WWW '05.

[61]  P. Sreenivasa Kumar,et al.  Efficient indexing and querying of XML data using modified Prüfer sequences , 2005, CIKM '05.

[62]  Vasilis Vassalos,et al.  Xpath on steroids: exploiting relational engines for xpath performance , 2007, SIGMOD '07.

[63]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.