Compressing Very Large Database Workloads for Continuous Online Index Selection

The paper presents a novel method for compressing large database workloads for purpose of autonomic, continuous index selection. The compressed workload contains a small subset of representative queries from the original workload. A single pass clustering algorithm with a simple and elegant selectivity based query distance metric guarantees low memory and time complexity. Experiments on two real-world database workloads show the method achieves high compression ratio without decreasing the quality of the index selection problem solutions.

[1]  Thomas A. Corbi,et al.  The dawning of the autonomic computing era , 2003, IBM Syst. J..

[2]  Daniel C. Zilio,et al.  DB2 advisor: an optimizer smart enough to recommend its own indexes , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[3]  Sam Lightstone,et al.  Physical Database Design for Relational Databases , 2009, Encyclopedia of Database Systems.

[4]  Hamid Pirahesh,et al.  Recommending materialized views and indexes with the IBM DB2 design advisor , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[5]  Kai-Uwe Sattler,et al.  Autonomous query-driven index mining , 2004, Proceedings. International Database Engineering and Applications Symposium, 2004. IDEAS '04..

[6]  Hamid Pirahesh,et al.  Recommending materialized views and indexes with the IBM DB2 design advisor , 2004 .

[7]  Serge Abiteboul,et al.  COLT: continuous on-line tuning , 2006, SIGMOD Conference.

[8]  Said Elnaffar,et al.  Today's DBMSs: how autonomic are they , 2003, 14th International Workshop on Database and Expert Systems Applications, 2003. Proceedings..

[9]  Surajit Chaudhuri,et al.  Compressing SQL workloads , 2002, SIGMOD '02.

[10]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[11]  Jennifer Widom,et al.  Database System Implementation , 2000 .

[12]  Vijay V. Raghavan,et al.  On the Selection of an Optimal Set of Indexes , 1983, IEEE Transactions on Software Engineering.

[13]  Serge Abiteboul,et al.  On-Line Index Selection for Shifting Workloads , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[14]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[15]  Henk M. Blanken,et al.  Index selection in relational databases , 1993, Proceedings of ICCI'93: 5th International Conference on Computing and Information.

[16]  Renzo Sprugnoli,et al.  Optimal Selection of Secondary Indexes , 1990, IEEE Trans. Software Eng..