Data Morphing: An Adaptive, Cache-Conscious Storage Technique

The number of processor cache misses has a critical impact on the performance of DBMSs running on servers with large main-memory configurations. In turn, the cache utilization of database systems is highly dependent on the physical organization of the records in main-memory. A recently proposed storage model, called PAX, was shown to greatly improve the performance of sequential file-scan operations when compared to the commonly implemented N-ary storage model. However, the PAX storage model can also demonstrate poor cache utilization for other common operations, such as index scans. Under a workload of heterogenous database operations, neither the PAX storage model nor the N-ary storage model is optimal. In this paper, we propose a flexible data storage technique called Data Morphing. Using Data Morphing, a cache-efficient attribute layout, called a partition, is first determined through an analysis of the query workload. This partition is then used as a template for storing data in a cache-efficient way. We present two algorithms for computing partitions, and also present a versatile storage model that accommodates the dynamic reorganization of the attributes in a file. Finally, we experimentally demonstrate that the Data Morphing technique provides a significant performance improvement over both the traditional N-ary storage model and the PAX model.

[1]  Wan Choi,et al.  Design and implementation of the concurrency control manager in the main-memory DBMS Tachyon , 2002, Proceedings 26th Annual International Computer Software and Applications.

[2]  S. Bing Yao An attribute based model for database access cost analysis , 1977, TODS.

[3]  Alfonso F. Cardenas Analysis and performance of inverted data base structures , 1975, CACM.

[4]  S. B. Yao,et al.  Approximating block accesses in database organizations , 1977, CACM.

[5]  G. Rota The Number of Partitions of a Set , 1964 .

[6]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[7]  Peter Boncz,et al.  Monet: An Impressionist Sketch of an Advanced Database System , 1994 .

[8]  David J. DeWitt,et al.  The Wisconsin Benchmark: Past, Present, and Future , 1991, The Benchmark Handbook.

[9]  Guido Moerkotte,et al.  Efficient Storage of XML Data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[10]  Chandra Krintz,et al.  Cache-conscious data placement , 1998, ASPLOS VIII.

[11]  A. Odlyzko Asymptotic enumeration methods , 1996 .

[12]  Jeffrey F. Naughton,et al.  Cache Conscious Algorithms for Relational Query Processing , 1994, VLDB.

[13]  Michael Stonebraker,et al.  The Asilomar report on database research , 1998, SGMD.

[14]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[15]  Shamkant B. Navathe,et al.  Vertical partitioning algorithms for database design , 1984, TODS.

[16]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[17]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[18]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[19]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[20]  James R. Larus,et al.  Cache-conscious structure layout , 1999, PLDI '99.