Efficient data organization and management on heterogeneous storage hierarchies

Due to preferences for design and implementation simplicity, current data organization and management in database systems are based on simple assumptions about storage devices and workload characteristics. This has been the major design principle since the inception of database systems. While the device- and workload-oblivious approach worked well in the past, it falls short when considering today's demands for fast data processing on large-scale datasets that have various characteristics. The ignorance of rich and diverse features in both devices and workloads has posed unnecessary performance trade-offs in existing database systems. This dissertation proposes efficient, flexible, and robust data organization and management for database systems by enhancing the interaction with workloads and hardware devices. It achieves the goal through three steps. First, a microbenchmark suite is needed for quick and accurate evaluation. The proposed solution is DBmbench, a significantly reduced database microbenchmark suite which simulates OLTP and DSS workloads. DBmbench enables quick evaluation and provides performance forecasting for real large-scale benchmarks. Second, Clotho investigates how to build a workload-concious buffer pool manager by utilizing query payload information. Clotho decouples the in-memory page layout from the storage organization by using a new query-specific layout called CSM. Due to its adaptive structure, CSM eliminates the long-standing performance trade-offs of NSM and DSM, thus achieving good performance for both DSS and OLTP applications, two predominant database workloads with conflict characteristics. Clotho demonstrates that simple workload information, such as query payloads, is of great value to improve performance without increasing complexity. The third step looks at how to use hardware information to eliminate performance trade-offs in existing device-oblivious designs. MultiMap is first proposed as a new mapping algorithm to store multidimensional data onto disks without losing spatial locality. MultiMap exploits the new adjacency model of disks to build a multidimensional structure; on top of the linear disk space. It outperforms existing mapping algorithms on various spatial queries. Later, MultiMap is expanded to organize intermediate results for hash join and external sorting where the I/O performance of different execution phases exhibits similar trade-offs as those in 2-D data accesses. Our prototype demonstrates an up to 2 times improvement over the existing implementation in memory limited executions. The above two projects complete Clotho by slowing the benefits of exploiting detailed hardware features.

[1]  Josep Torrellas,et al.  Detailed characterization of a quad Pentium Pro server running TPC-D , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[2]  David A. Patterson,et al.  Performance characterization of a Quad Pentium Pro SMP using OLTP workloads , 1998, ISCA.

[3]  David A. Patterson,et al.  Towards a Simplified Database Workload for Computer Architecture Evaluations , 2000 .

[4]  Manfred Schroeder,et al.  Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise , 1992 .

[5]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[6]  Christos Faloutsos,et al.  Declustering using fractals , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[7]  Jeffrey Scott Vitter,et al.  Greed sort: optimal deterministic sorting on parallel disks , 1995, JACM.

[8]  Khaled A. S. Abdel-Ghaffar,et al.  Efficient retrieval of multidimensional datasets through parallel I/O , 1998, Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238).

[9]  Jim Gray,et al.  FastSort: a distributed single-input single-output external sort , 1990, SIGMOD '90.

[10]  Peter Z. Kunszt,et al.  Data Mining the SDSS SkyServer Database , 2002, WDAS.

[11]  Todd C. Mowry,et al.  Improving index performance through prefetching , 2001, SIGMOD '01.

[12]  David R. O'Hallaron,et al.  A Computational Database System for Generatinn Unstructured Hexahedral Meshes with Billions of Elements , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[13]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[14]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[15]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[16]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[17]  J. Griffin,et al.  Designing computer systems with MEMS-based storage , 2000, SIGP.

[18]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[19]  Erich Schikuta,et al.  Improving the Performance of High-Energy Physics Analysis through Bitmap Indices , 2000, DEXA.

[20]  Anastasia Ailamaki,et al.  Improving Hash Join Performance through Prefetching , 2004, ICDE.

[21]  Khaled A. S. Abdel-Ghaffar,et al.  Cyclic allocation of two-dimensional data , 1998, Proceedings 14th International Conference on Data Engineering.

[22]  Trung A. Diep,et al.  Branch behavior of a commercial OLTP workload on Intel IA32 processors , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[23]  S. Parekh,et al.  An analysis of database workload performance on simultaneous multithreaded processors , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[24]  Susan J. Eggers,et al.  An analysis of database workload performance on simultaneous multithreaded processors , 1998, ISCA.

[25]  Hansjörg Zeller,et al.  An Adaptive Hash Join Algorithm for Multiuser Environments , 1990, VLDB.

[26]  Luiz André Barroso,et al.  Impact of chip-level integration on performance of OLTP workloads , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[27]  Anastasia Ailamaki,et al.  Clotho: Decoupling memory page layout from storage organization , 2004, VLDB.

[28]  Bernhard Seeger,et al.  Multi-disk B-trees , 1991, SIGMOD '91.

[29]  David J. DeWitt,et al.  Shoring up persistent applications , 1994, SIGMOD '94.

[30]  Khaled A. S. Abdel-Ghaffar,et al.  Optimal Allocation of Two-Dimensional Data , 1997, ICDT.

[31]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[32]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[33]  Sarita V. Adve,et al.  Performance of database workloads on shared-memory systems with out-of-order processors , 1998, ASPLOS VIII.

[34]  Christos Faloutsos,et al.  Declustering Spatial Databases on a Multi-Computer Architecture , 1996, EDBT.

[35]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[36]  Christos Faloutsos,et al.  Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..

[37]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.

[38]  Anastasia Ailamaki,et al.  Atropos: A Disk Array Volume Manager for Orchestrated Use of Disks , 2004, FAST.

[39]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[40]  Mikhail J. Atallah,et al.  (Almost) Optimal parallel block access for range queries , 2003, Inf. Sci..

[41]  Mikhail J. Atallah,et al.  (Almost) optimal parallel block access to range queries , 2000, PODS '00.

[42]  George G. Gorbatenko,et al.  PERFORMANCE of TWO-DIMENSIONAL DATA MODELS for I/O LIMITED NON-NUMERIC APPLICATIONS , 2002 .

[43]  Richard P. Mount The Office of Science Data-Management Challenge , 2005 .

[44]  Per-Åke Larson,et al.  Buffering and Read-Ahead Strategies for External Mergesort , 1998, VLDB.

[45]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[46]  Christos Faloutsos,et al.  Parallel R-trees , 1992, SIGMOD '92.

[47]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[48]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[49]  Erik Riedel,et al.  More Than an Interface - SCSI vs. ATA , 2003, FAST.

[50]  Leon Abelmann,et al.  Single-chip computers with microelectromechanical systems-based magnetic memory (invited) , 2000 .

[51]  D. Hilbert Ueber die stetige Abbildung einer Line auf ein Flächenstück , 1891 .

[52]  Christos Faloutsos,et al.  Multiattribute hashing using Gray codes , 1986, SIGMOD '86.

[53]  Per-Åke Larson,et al.  Dynamic Memory Adjustment for External Mergesort , 1997, VLDB.

[54]  Martin L. Kersten,et al.  Generic Database Cost Models for Hierarchical Memory Systems , 2002, VLDB.

[55]  Leonard D. Shapiro,et al.  Join processing in database systems with large main memories , 1986, TODS.

[56]  Shreekant S. Thakkar,et al.  Performance of an OLTP application on symmetry multiprocessor system , 1990, ISCA '90.

[57]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[58]  Babak Falsafi,et al.  DBmbench: fast and accurate database workload representation on modern microarchitecture , 2005, CASCON.

[59]  Christos Faloutsos,et al.  On multidimensional data and modern disks , 2005, FAST'05.

[60]  Randeep Bhatia,et al.  Declustering using golden ratio sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[61]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[62]  Josep Torrellas,et al.  The memory performance of DSS commercial workloads in shared-memory multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[63]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[64]  Gregory R. Ganger,et al.  Track-Aligned Extents: Matching Access Patterns to Disk Drive Characteristics , 2002, FAST.

[65]  Anastasia Ailamaki,et al.  MultiMap: Preserving disk locality for multidimensional datasets , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[66]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[67]  Jignesh M. Patel,et al.  Data Morphing: An Adaptive, Cache-Conscious Storage Technique , 2003, VLDB.

[68]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[69]  Gregory R. Ganger,et al.  Designing computer systems with MEMS-based storage , 2000, ASPLOS.

[70]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[71]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[72]  D. Hilbert Über die stetige Abbildung einer Linie auf ein Flächenstück , 1935 .

[73]  Anastasia Ailamaki,et al.  Lachesis: Robust Database Storage Management Based on Device-specific Performance Characteristics , 2003, VLDB.

[74]  Per-Åke Larson,et al.  External Sorting: Run Formation Revisited , 2003, IEEE Trans. Knowl. Data Eng..

[75]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[76]  H KatzRandy,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988 .

[77]  Divyakant Agrawal,et al.  Tabular Placement of Relational Data on MEMS-based Storage Devices , 2003, VLDB.

[78]  Kenneth A. Ross,et al.  A multi-resolution block storage model for database design , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[79]  Christian Böhm,et al.  Computing Clusters of Correlation Connected objects , 2004, SIGMOD '04.

[80]  Kwan-Liu Ma,et al.  A Parallel Visualization Pipeline for Terascale Earthquake Simulations , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[81]  Hidehiko Tanaka,et al.  Application of hash to data base machine and its architecture , 1983, New Generation Computing.

[82]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[83]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[84]  Gregory R. Ganger,et al.  Exposing and Exploiting Internal Parallelism in MEMS-based Storage (CMU-CS-03-125) , 2003 .

[85]  T. Todd Elvins,et al.  A survey of algorithms for volume visualization , 1992, COMG.

[86]  John Paul Shen,et al.  Scaling and characterizing database workloads: bridging the gap between research and practice , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[87]  Alan Sussman,et al.  Improving access to multi-dimensional self-describing scientific datasets , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[88]  Matthew Huras,et al.  Multi-dimensional clustering: a new data layout scheme in DB2 , 2003, SIGMOD '03.

[89]  David R. O'Hallaron,et al.  Etree: a database-oriented method for generating large octree meshes , 2004, Engineering with Computers.

[90]  Roderic G. G. Cattell The benchmark handbook for database and transaction processing systems , 1991 .

[91]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[92]  Masaya Nakayama,et al.  Hash-Partitioned Join Method Using Dynamic Destaging Strategy , 1988, VLDB.