Querying XML in Timber

In this paper, we describe the TIMBER XML database system implemented at University of Michigan. TIMBER was one of the first native XML database systems, designed from the ground up to store and query semi-structured data. A distinctive principle of TIMBER is its algebraic underpinning. Central contributions of the TIMBER project include: (1) tree algebras that capture the structural nature of XML queries; (2) the stack-based family of algorithms to evaluate structural joins; (3) new rule-based query optimization techniques that take care of the heterogeneous nature of the intermediate results and take the schema information into consideration; (4) cost-based query optimization techniques and summary structures for result cardinality estimation; and (5) a family of structural indices for more efficient query evaluation. In this paper, we describe not only the architecture of TIMBER, its storage model, and engineering choices we made, but also present in hindsight, our retrospective on what went well and not so well with our design and engineering choices. Figure 1: TIMBER Architecture: XML documents are parsed and nodes stored individually in the back-end store. Parsed queries, from multiple supported interfaces, go through a query optimizer to the query evaluator in a relatively standard overall database system architecture. The TIMBER system [10, 16] was developed at the University of Michigan, Ann Arbor, beginning 1999. It was an early native XML data management system. In this retrospective, we take stock of our work over the past nine years. Figure 1 provides an overview of the major system components. Secs. 1 through 4 describe the underlying algebra, query evaluation methods, query optimization, and indices, respectively. Sec. 5 mentions aspects of TIMBER not included in this article. Sec. 6 concludes with a retrospective view. 1 Algebra Relational algebra has been a crucial foundation for relational database systems, and has played a large role in enabling their success. A corresponding XML algebra for XML query processing has been more elusive, due to the comparative complexity of XML, and its history. In the relational model, a tuple is the basic unit of operation and a relation is a set of tuples. In XML, a database is often described as a forest of rooted node-labeled trees. Hence, for the basic unit and central construct of our algebra, we chose an XML query pattern (or Copyright 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering

[1]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[2]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.

[3]  George H. L. Fletcher,et al.  Structural characterizations of the semantics of XPath as navigation tool on a document , 2006, PODS.

[4]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[5]  Laks V. S. Lakshmanan,et al.  Tree logical classes for efficient evaluation of XQuery , 2004, SIGMOD '04.

[6]  H. V. Jagadish,et al.  Pattern Tree Algebras: Sets or Sequences? , 2005, VLDB.

[7]  Jignesh M. Patel,et al.  Estimating Answer Sizes for XML Queries , 2002, EDBT.

[8]  Laks V. S. Lakshmanan,et al.  Colorful XML: one hierarchy isn't enough , 2004, SIGMOD '04.

[9]  David J. DeWitt,et al.  Shoring up persistent applications , 1994, SIGMOD '94.

[10]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[11]  Jignesh M. Patel,et al.  The Michigan benchmark: towards XML query performance diagnostics , 2006, Inf. Syst..

[12]  Object-Oriented Data,et al.  An Indexing Technique for Object-Oriented Databases , 1991 .

[13]  Cong Yu,et al.  TIMBER: A native XML database , 2002, The VLDB Journal.

[14]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  Dirk Van Gucht,et al.  Trie Indexes for Efficient XML Query Evaluation , 2008, WebDB.

[17]  Jignesh M. Patel,et al.  Structural join order selection for XML query optimization , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[18]  Guido Moerkotte,et al.  Indexing Multiple Sets , 1994, VLDB.

[19]  Cong Yu,et al.  TIMBER: a native system for querying XML , 2003, SIGMOD '03.

[20]  George H. L. Fletcher,et al.  A methodology for coupling fragments of XPath with structural indexes for XML documents , 2007, Inf. Syst..

[21]  Cong Yu,et al.  Querying structured text in an XML database , 2003, SIGMOD '03.

[22]  Elisa Bertino,et al.  An Indexing Technique for Object-Oriented Databases , 1991, ICDE 1991.

[23]  Laks V. S. Lakshmanan,et al.  Grouping in XML , 2002, EDBT Workshops.

[24]  Jignesh M. Patel,et al.  Using histograms to estimate answer sizes for XML queries , 2003, Inf. Syst..

[25]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[26]  Joe Marini,et al.  Document Object Model , 2002, Encyclopedia of GIS.