An index selection method without repeated optimizer estimations

The index selection problem (ISP) concerns the selection of an appropriate index set to minimize the total cost for a given workload containing read and update queries. Since the ISP has been proven to be an NP-hard problem, most studies focus on heuristic algorithms to obtain approximate solutions. However, even approximate algorithms still consume a large amount of computing time and disk space because these systems must record all query statements and frequently request from the database optimizers the cost estimation of each query in each considered index. This study proposes a novel algorithm without repeated optimizer estimations. When a query is delivered to a database system, the optimizer evaluates the costs of various query plans and chooses an access path for the query. The information from the evaluation stage is aggregated and recorded with limited space. The proposed algorithm can recommend indexes according to the readily available information without querying the optimizer again. The proposed algorithm was tested in a PostgreSQL database system using TPC-H data. Experimental results show the effectiveness of the proposed approach.

[1]  Vijay V. Raghavan,et al.  On the Selection of an Optimal Set of Indexes , 1983, IEEE Transactions on Software Engineering.

[2]  Yishai A. Feldman,et al.  A knowledge-based approach for index selection in relational databases , 2003, Expert Syst. Appl..

[3]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[4]  Zhen He,et al.  Cost modeling of spatial operators using non-parametric regression , 2007, Inf. Sci..

[5]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[6]  Matteo Fischetti,et al.  Exact and Approximate Algorithms for the Index Selection Problem in Physical Database Design , 1995, IEEE Trans. Knowl. Data Eng..

[7]  Elena Barcucci,et al.  Index Selection in a Distributed Data Base , 1984, DDSS.

[8]  Eugene Wong,et al.  Query processing in a system for distributed databases (SDD-1) , 1981, TODS.

[9]  Egon Balas,et al.  An Algorithm for Large Zero-One Knapsack Problems , 1980, Oper. Res..

[10]  Elena Barcucci,et al.  Index Selection in Relational Databases , 1989, MFDBS.

[11]  Philip S. Yu,et al.  On Index Selection Schemes for Nested Object Hierarchies , 1994, VLDB.

[12]  M. Schkolnick,et al.  Physical database design for relational databases , 1988, TODS.

[13]  Matthias Jarke,et al.  Query Optimization in Database Systems , 1984, CSUR.

[14]  Douglas Comer,et al.  The difficulty of optimum index selection , 1978, TODS.

[15]  Habiba Drias,et al.  Selection and Pruning Algorithms for Bitmap Index Selection Problem Using Data Mining , 2007, DaWaK.

[16]  Surajit Chaudhuri,et al.  Microsoft index turning wizard for SQL Server 7.0 , 1998, SIGMOD '98.

[17]  Renzo Sprugnoli,et al.  Optimal Selection of Secondary Indexes , 1990, IEEE Trans. Software Eng..

[18]  Wu Jigang,et al.  An efficient algorithm for the collapsing knapsack problem , 2006, Inf. Sci..

[19]  Sam Lightstone,et al.  Toward autonomic computing with DB2 universal database , 2002, SGMD.

[20]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[21]  Jeffrey A. Hoffer An integer programming formulation of computer data base design problems , 1976, Inf. Sci..

[22]  Ramtjander Choenni On the automation of physical database design , 1993, SAC '93.

[23]  Sam Lightstone,et al.  Self-managing technology in IBM DB2 universal database , 2001, CIKM '01.

[24]  Dennis Shasha,et al.  A Framework for Automating Physical Database Design , 1991, VLDB.