Options in physical database design

A cornerstone of modern database systems is physical data independence, i.e., the separation of a type and its associated operations from its physical representation in memory and on storage media. Users manipulate and query data at the logical level; the DBMS translates these logical operations to operations on files, indices, records, and disks. The efficiency of these physical operations depends very much on the choice of data representations. Choosing a physical representation for a logical database is called physical database design. The number of possible choices in physical database design is very large; moreover, they very often interact with each other. We attempt to list and classify these choices and to explore their interactions. The purpose of this paper is to provide an overview of possible options to the DBMS developer and some guidance to the DBMS administrator and user. While much of our discussion will draw on the relational data model, physical database design is of even more importance for object-oriented and extensible systems. The reasons are simple: First, the number of logical data types and their operations is larger, requiring and permitting more choices for their representation. Second, the state of the art in query optimization for these systems is much less developed than for relational systems, making careful physical database design even more imperative for object-oriented database systems.

[1]  Randy H. Katz,et al.  Exploiting inheritance and structure semantics for effective clustering and buffering in an object-oriented DBMS , 1989, SIGMOD '89.

[2]  David J. DeWitt,et al.  GAMMA - A High Performance Dataflow Database Machine , 1986, VLDB.

[3]  Theo Härder Implementing a generalized access path structure for a relational database system , 1978, TODS.

[4]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[5]  Nick Roussopoulos,et al.  An incremental access method for ViewCache: concept, algorithms, and cost analysis , 1991, TODS.

[6]  Theo Härder,et al.  Evaluation of a multiple version cheme for concurrency control , 1987, Inf. Syst..

[7]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[8]  Stanley B. Zdonik,et al.  Fido: A Cache That Learns to Fetch , 1991, VLDB.

[9]  Goetz Graefe,et al.  Data compression and database performance , 1991, [Proceedings] 1991 Symposium on Applied Computing.

[10]  Shamkant B. Navathe,et al.  A formal approach to the vertical partitioning problem in distributed database design , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[11]  Michael J. Carey,et al.  A performance evaluation of pointer-based joins , 1990, SIGMOD '90.

[12]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[13]  M. Schkolnick,et al.  Physical database design for relational databases , 1988, TODS.

[14]  H. V. Jagadish,et al.  A compression technique to materialize transitive closure , 1990, TODS.

[15]  Michael Hammer,et al.  A heuristic approach to attribute partitioning , 1979, SIGMOD '79.

[16]  Ronald Fagin,et al.  Extendible hashing—a fast access method for dynamic files , 1979, ACM Trans. Database Syst..

[17]  Per-Åke Larson Analysis of index-sequential files with overflow chaining , 1981, TODS.

[18]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[19]  Randy H. Katz,et al.  Version modeling concepts for computer-aided design databases , 1986, SIGMOD '86.

[20]  Anant Jhingran Precomputation in a complex object environment , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[21]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[22]  Per-Åke Larson,et al.  Query Transformation for PSJ-Queries , 1987, VLDB.

[23]  Guido Moerkotte,et al.  Access support in object bases , 1990, SIGMOD '90.

[24]  Sharma Chakravarthy Divide and conquer: A basis for augmenting a conventional query optimizer with multiple query-processing capabilities , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[25]  Michael Stonebraker,et al.  Distributed RAID-a new multiple copy algorithm , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[26]  Shahram Ghandeharizadeh,et al.  Object Placement in Parallel Hypermedia Systems , 1991, VLDB.

[27]  Philip S. Yu,et al.  An Effective Approach to Vertical Partitioning for Physical Design of Relational Databases , 1990, IEEE Trans. Software Eng..

[28]  Roger King,et al.  Cactis: a self-adaptive, concurrent implementation of an object-oriented database management system , 1989, ACM Trans. Database Syst..

[29]  Shamkant B. Navathe,et al.  Vertical partitioning for database design: a graphical algorithm , 1989, SIGMOD '89.

[30]  A. Guttmma,et al.  R-trees: a dynamic index structure for spatial searching , 1984 .

[31]  Gio Wiederhold,et al.  Incremental Recomputation of Active Relational Expressions , 1991, IEEE Trans. Knowl. Data Eng..

[32]  Michael Stonebraker,et al.  Managing persistent objects in a multi-level store , 1991, SIGMOD '91.

[33]  Clifford A. Lynch,et al.  Application of Data Compression to a Large Bibliographic Data Base , 1981, VLDB.

[34]  Arvola Chan,et al.  Index selection in a self-adaptive data base management system , 1976, SIGMOD '76.

[35]  Charles E. Hughes,et al.  Analysis of a Virtual Memory Model For Maintaining Database Views , 1992, IEEE Trans. Software Eng..

[36]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[37]  Nancy L. Martin,et al.  Join index, materialized view, and hybrid-hash join: a performance analysis , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[38]  Ali R. Hurson,et al.  Effective clustering of complex objects in object-oriented databases , 1991, SIGMOD '91.

[39]  Huei-Huang Chen,et al.  Combining relational and network retrieval methods , 1984, SIGMOD '84.

[40]  Jaideep Srivastava,et al.  A Multidimensional Declustering Method for Parallel Database Systems , 1992 .

[41]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[42]  Michael Stonebraker,et al.  The Design of the POSTGRES Storage System , 1988, VLDB.

[43]  Edward Babb,et al.  Joined normal form: a storage encoding for relational databasess , 1982, TODS.

[44]  Christos H. Papadimitriou,et al.  On Concurrency Control by Multiple Versions , 1982, TODS.

[45]  Hector Garcia-Molina,et al.  Disk striping , 1986, 1986 IEEE Second International Conference on Data Engineering.

[46]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[47]  Eric N. Hanson,et al.  A performance analysis of view materialization strategies , 1987, SIGMOD '87.

[48]  Randy H. Katz,et al.  Database Support for Versions and Alternatives of Large Design Files , 1984, IEEE Transactions on Software Engineering.

[49]  Michael Stonebraker,et al.  Database systems: achievements and opportunities , 1990, SGMD.

[50]  Philip S. Yu,et al.  A hybrid data sharing-data partitioning architecture for transaction processing , 1988, Proceedings. Fourth International Conference on Data Engineering.

[51]  Jaideep Srivastava,et al.  CMD: A Multidimensional Declustering Method for Parallel Data Systems , 1992, VLDB.

[52]  Elisa Bertino,et al.  Indexing Techniques for Queries on Nested Objects , 1989, IEEE Trans. Knowl. Data Eng..

[53]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[54]  Gordon V. Cormack,et al.  Data compression on a database system , 1985, CACM.

[55]  Simon Kao DECIDES: An expert system tool for physical database design , 1986, 1986 IEEE Second International Conference on Data Engineering.

[56]  Elisa Bertino,et al.  An Indexing Technique for Object-Oriented Databases , 1991, ICDE 1991.

[57]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[58]  Jim Gray,et al.  Parity Striping of Disk Arrays: Low-Cost Reliable Storage with Acceptable Throughput , 1990, VLDB.

[59]  Guy M. Lohman,et al.  Differential files: their application to the maintenance of large databases , 1976, TODS.

[60]  John McPherson,et al.  An Incremental Join Attachment for Starburst , 1990, VLDB.

[61]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[62]  David B. Lomet,et al.  The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[63]  Peter Scheuermann,et al.  A parallel algorithm for record clustering , 1990, TODS.

[64]  Gerhard Weikum,et al.  Dynamic file allocation in disk arrays , 1991, SIGMOD '91.

[65]  David J. DeWitt,et al.  Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines , 1990, VLDB.

[66]  Patrick Valduriez,et al.  Join indices , 1987, TODS.

[67]  Lilian Harada,et al.  Query processing method for multi-attribute clustered relations , 1990, VLDB 1990.

[68]  Jeff Dozier Keynote address: access to data in NASA's Earth observing system , 1992, SIGMOD '92.

[69]  Philip S. Yu,et al.  A vertical partitioning algorithm for relational databases , 1987, 1987 IEEE Third International Conference on Data Engineering.

[70]  Roger King,et al.  The Performance and Utility of the Cactis Implementation Algorithms , 1990, VLDB.

[71]  Jianzhong Li,et al.  A New Compression Method with Fast Searching on Large Databases , 1987, VLDB.

[72]  Dina Bitton,et al.  Disk Shadowing , 1988, VLDB.