Index selection for databases: a hardness study and a principled heuristic solution

We study the index selection problem: Given a workload consisting of SQL statements on a database, and a user-specified storage constraint, recommend a set of indexes that have the maximum benefit for the given workload. We present a formal statement for this problem and show that it is computationally "hard" to solve or even approximate it. We develop a new algorithm for the problem which is based on treating the problem as a knapsack problem. The novelty of our approach lies in an LP (linear programming) based method that assigns benefits to individual indexes. For a slightly modified algorithm, that does more work, we prove that we can give instance specific guarantees about the quality of our solution. We conduct an extensive experimental evaluation of this new heuristic and compare it with previous solutions. Our results demonstrate that our solution is more scalable while achieving comparable quality.

[1]  Lars Engebretsen Lower Bounds for non-Boolean Constraint Satisfaction Programs , 2000 .

[2]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[3]  Johan Håstad,et al.  Some optimal inapproximability results , 2001, JACM.

[4]  Uriel Feige,et al.  The Dense k -Subgraph Problem , 2001, Algorithmica.

[5]  David P. Williamson,et al.  .879-approximation algorithms for MAX CUT and MAX 2SAT , 1994, STOC '94.

[6]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[7]  Dennis Shasha,et al.  A Framework for Automating Physical Database Design , 1991, VLDB.

[8]  Howard J. Karloff,et al.  On the complexity of the view-selection problem , 1999, PODS '99.

[9]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[10]  G. Nemhauser,et al.  Integer Programming , 2020 .

[11]  Uriel Feige,et al.  Approximating the value of two power proof systems, with applications to MAX 2SAT and MAX DICUT , 1995, Proceedings Third Israel Symposium on the Theory of Computing and Systems.

[12]  Daniel C. Zilio,et al.  DB2 advisor: an optimizer smart enough to recommend its own indexes , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[13]  Dorit S. Hochba,et al.  Approximation Algorithms for NP-Hard Problems , 1997, SIGA.

[14]  Surajit Chaudhuri,et al.  AutoAdmin “what-if” index analysis utility , 1998, SIGMOD '98.

[15]  Arvola Chan,et al.  Index selection in a self-adaptive data base management system , 1976, SIGMOD '76.

[16]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[17]  Maria J. Serna,et al.  The (Parallel) Approximability of Non-Boolean Satisfiability Problems and Restricted Integer Programming , 1998, STACS.

[18]  Mihir Bellare,et al.  Interactive proofs and approximation: reductions from two provers in one round , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[19]  Douglas Comer,et al.  The difficulty of optimum index selection , 1978, TODS.

[20]  Henk M. Blanken,et al.  Index selection in relational databases , 1993, Proceedings of ICCI'93: 5th International Conference on Computing and Information.

[21]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multi-Cube Data Models , 2000, EDBT.

[22]  Leonard Pitt,et al.  Optimal indexing using near-minimal space , 2003, PODS '03.