Autonomous Index Optimization in XML Databases

Defining suitable indexes is a major task when optimizing a database. Usually, a human database administrator defines a set of indexes in the design phase of the database. This can be done manually or with the help of so called index wizard tools analyzing predefined database operations. Even having an optimal initial set of indexes when setting up a database, there is no guarantee that these indexes will suit future demands. Rather, it is realistic that the typical usage of the database will change after a while because new queries appear, for instance. In consequence, the existing indexes are suboptimal. The typical way to handle this problem is that a database administrator maintains the database permanently. In XML database management systems (XDBMS) this problem becomes even worse: Because XML queries cover both content and structure the number of possible queries and indexes is significantly higher. Additionally, for XML data without schema information, queries and indexes cannot be defined in advance, because the structure and the content of the data is not restricted. Both facts tend to result in higher maintenance costs for XML indexes compared to relational indexes. In this paper we show by performance measurements that an adaptive XDBMS that analyzes its workload periodically and creates/drops XML indexes automatically guarantees a high performance over the total life time of a database. Although we present our index system called KeyX the idea and the results are transferable to other XML indexing approaches.

[1]  H. Schoning Tamino - a DBMS designed for XML , 2001, Proceedings 17th International Conference on Data Engineering.

[2]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[3]  Surajit Chaudhuri,et al.  Microsoft index turning wizard for SQL Server 7.0 , 1998, SIGMOD '98.

[4]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[5]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[6]  Matteo Fischetti,et al.  Exact and Approximate Algorithms for the Index Selection Problem in Physical Database Design , 1995, IEEE Trans. Knowl. Data Eng..

[7]  Cong Yu,et al.  TIMBER: A native XML database , 2002, The VLDB Journal.

[8]  Jennifer Widom,et al.  Indexing Semistructured Data , 1998 .

[9]  Ioana Manolescu,et al.  The XML benchmark project , 2001 .

[10]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[11]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[12]  Volker Linnemann,et al.  A Selective Key-Oriented XML Index for the Index Selection Problem in XDBMS , 2004, DEXA.

[13]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[14]  Rudolf Bayer,et al.  Multidimensional Mapping and Indexing of XML , 2003, BTW.

[15]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[16]  Sven Helmer,et al.  Anatomy of a native XML base management system , 2002, The VLDB Journal.

[17]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[18]  Daniel C. Zilio,et al.  DB2 advisor: an optimizer smart enough to recommend its own indexes , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[19]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[20]  Douglas Comer,et al.  The difficulty of optimum index selection , 1978, TODS.

[21]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[22]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[23]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[24]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[25]  Surajit Chaudhuri,et al.  AutoAdmin “what-if” index analysis utility , 1998, SIGMOD '98.

[26]  M. Todd,et al.  Chapter II Linear programming , 1989 .