Physical database design for relational databases

This paper describes the concepts used in the implementation of DBDSGN, an experimental physical design tool for relational databases developed at the IBM San Jose Research Laboratory. Given a workload for System R (consisting of a set of SQL statements and their execution frequencies), DBDSGN suggests physical configurations for efficient performance. Each configuration consists of a set of indices and an ordering for each table. Workload statements are evaluated only for atomic configurations of indices, which have only one index per table. Costs for any configuration can be obtained from those of the atomic configurations. DBDSGN uses information supplied by the System R optimizer both to determine which columns might be worth indexing and to obtain estimates of the cost of executing statements in different configurations. The tool finds efficient solutions to the index-selection problem; if we assume the cost estimates supplied by the optimizer are the actual execution costs, it finds the optimal solution. Optionally, heuristics can be used to reduce execution time. The approach taken by DBDSGN in solving the index-selection problem for multiple-table statements significantly reduces the complexity of the problem. DBDSGN's principles were used in the Relational Design Tool (RDT), an IBM product based on DBDSGN, which performs design for SQL/DS, a relational system based on System R. System R actually uses DBDSGN's suggested solutions as the tool expects because cost estimates and other necessary information can be obtained from System R using a new SQL statement, the EXPLAIN statement. This illustrates how a system can export a model of its internal assumptions and behavior so that other systems (such as tools) can share this model.

[1]  Mario Schkolnick,et al.  The Optimal Selection of Secondary Indices for Files , 1975, Inf. Syst..

[2]  Alfonso F. Cardenas Analysis and performance of inverted data base structures , 1975, CACM.

[3]  Eugene Wong,et al.  Decomposition—a strategy for query processing , 1976, TODS.

[4]  Michael Stonebraker,et al.  The design and implementation of INGRES , 1976, TODS.

[5]  Arvola Chan,et al.  Index selection in a self-adaptive data base management system , 1976, SIGMOD '76.

[6]  Irving L. Traiger,et al.  System R: relational approach to database management , 1976, TODS.

[7]  Donald D. Chamberlin,et al.  SEQUEL 2: A Unified Approach to Data Definition, Manipulation, and Control , 1976, IBM J. Res. Dev..

[8]  P. Bruce Berra,et al.  Minimum cost selection of secondary indexes for formatted files , 1977, TODS.

[9]  Douglas Comer,et al.  The difficulty of optimum index selection , 1978, TODS.

[10]  Mario Schkolnick A Survey of Physical Database Design Methodology and Techniques , 1978, VLDB.

[11]  Paolo Tiberio,et al.  Considerations in developing a design tool for a relational DBMS , 1979, COMPSAC.

[12]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[13]  Michael Stonebraker,et al.  Performance analysis of a relational data base management system , 1979, SIGMOD '79.

[14]  Anne Putkonen On the selection of the access path in inverted database organization , 1979, Inf. Syst..

[15]  Göran Goldkuhl,et al.  A systematic approach to information systems development-I - Introduction , 1979, Inf. Syst..

[16]  Eugene Wong,et al.  Query Processing In A Relational Database Management System , 1979, Fifth International Conference on Very Large Data Bases, 1979..

[17]  John G. Kollias A heuristic approach for determining the optimal degree of file inversion , 1979, Inf. Syst..

[18]  Won Kim,et al.  Performance of the System R Access Path Selection Mechanism , 1980, IFIP Congress.

[19]  Patricia G. Selinger,et al.  Support for repetitive transactions and ad hoc queries in System R , 1981, TODS.

[20]  Gio Wiederhold,et al.  Separability - an approach to physical data base design , 1981, VLDB 1981.

[21]  Irving L. Traiger,et al.  A history and evaluation of System R , 1981, CACM.

[22]  Irving L. Traiger,et al.  System R: An Architectural Overview , 1999, IBM Syst. J..

[23]  Won Kim,et al.  On optimizing an SQL-like nested query , 1982, TODS.

[24]  Michael L. Brodie,et al.  Relational Database Systems: Analysis and Comparison , 1983 .

[25]  Kyu-Young Whang A physical database design methodology using the property of separability , 1983 .

[26]  Paolo Tiberio,et al.  A Separability-Based Method for Secondary Index Selection in Physical Database Design , 1983, Methodology and Tools for Data Base Design.

[27]  Vijay V. Raghavan,et al.  On the Selection of an Optimal Set of Indexes , 1983, IEEE Transactions on Software Engineering.

[28]  Stavros Christodoulakis,et al.  Implications of certain assumptions in database performance evauation , 1984, TODS.

[29]  Gio Wiederhold,et al.  Separability —An Approach to Physical Database Design , 1984, IEEE Transactions on Computers.

[30]  Paolo Tiberio,et al.  An Approximation Algorithm for Secondary Index Selection in Relational Database Physical Design , 1985, Comput. J..

[31]  Paolo Tiberio,et al.  Estimating the cost of updates in a relational database , 1985, TODS.

[32]  John G. Kollias,et al.  On the Selection of a Reduced Set of Indexes , 1985, Comput. J..

[33]  Guy M. Lohman,et al.  R* optimizer validation and performance evaluation for local queries , 1986, SIGMOD '86.

[34]  Kyu-Young Whang,et al.  Approximating the number of unique values of an attribute without sorting , 1987, Inf. Syst..

[35]  Harry K. T. Wong,et al.  Optimization of nested SQL queries revisited , 1987, SIGMOD '87.