Self-selecting, self-tuning, incrementally optimized indexes

In a relational data warehouse with many tables, the number of possible and promising indexes exceeds human comprehension and requires automatic index tuning. While monitoring and reactive index tuning have been proposed, adaptive indexing focuses on adapting the physical database layout for and by actual queries. "Database cracking" is one such technique. Only if and when a column is used in query predicates, an index for the column is created; and only if and when a key range is queried, the index is optimized for this key range. The effect is akin to a sort that is adaptive and incremental. This sort is, however, very inefficient, particularly when applied on block-access devices. In contrast, traditional index creation sorts data with an efficient merge sort optimized for block-access devices, but it is neither adaptive nor incremental. We propose adaptive merging, an adaptive, incremental, and efficient technique for index creation. Index optimization focuses on key ranges used in actual queries. The resulting index adapts more quickly to new data and to new query patterns than database cracking. Sort efficiency is comparable to that of traditional B-tree creation. Nonetheless, the new technique promises better query performance than database cracking, both in memory and on block-access storage.

[1]  Martin L. Kersten,et al.  Adaptive Segmentation for Scientific Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Theo Härder Selecting an Optimal Set of Secondary Indices , 1976, ECI.

[3]  Michael Stonebraker,et al.  The case for partial indexes , 1989, SGMD.

[4]  Surajit Chaudhuri,et al.  Physical Design Refinement: The "Merge-Reduce" Approach , 2006, EDBT.

[5]  Goetz Graefe,et al.  Sorting And Indexing With Partitioned B-Trees , 2003, CIDR.

[6]  Martin L. Kersten,et al.  Cracking the Database Store , 2005, CIDR.

[7]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[8]  David B. Lomet,et al.  Alphasort: A cache-sensitive parallel external sort , 1995, The VLDB Journal.

[9]  Michael Stonebraker The choice of partial inversions and combined indices , 2004, International Journal of Computer & Information Sciences.

[10]  Martin L. Kersten,et al.  Updating a cracked database , 2007, SIGMOD '07.

[11]  Martin L. Kersten,et al.  Self-organizing strategies for a column-store database , 2008, EDBT '08.

[12]  Rohit Jain,et al.  Efficient Search of Multi-Dimensional B-Trees , 1995, VLDB.

[13]  Goetz Graefe,et al.  Implementing sorting in database systems , 2006, CSUR.

[14]  Jon Louis Bentley,et al.  Engineering a sort function , 1993, Softw. Pract. Exp..

[15]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[16]  Goetz Graefe,et al.  The five-minute rule twenty years later, and how flash memory changes the rules , 2007, DaMoN '07.

[17]  Goetz Graefe,et al.  Hierarchical locking in B-tree indexes , 2007, BTW.

[18]  Rudolf Bayer,et al.  Prefix B-trees , 1977, TODS.

[19]  Surajit Chaudhuri,et al.  Table of Contents (pdf) , 2007, VLDB.

[20]  Praveen Seshadri,et al.  Generalized partial indexes , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[21]  Rudolf Bayer,et al.  The Universal B-Tree for Multidimensional Indexing: general Concepts , 1997, WWCA.

[22]  C. A. R. Hoare,et al.  Algorithm 64: Quicksort , 1961, Commun. ACM.

[23]  Sam Lightstone,et al.  Physical Database Design for Relational Databases , 2009, Encyclopedia of Database Systems.