Cache Conscious Data Layouting for In-Memory Databases

htmlabstractMany applications with manually implemented data management exhibit a data storage pattern in which semantically related data items are stored closer in memory than unrelated data items. The strong sematic relationship between these data items commonly induces contemporary accesses to them. This is called the principle of data locality and has been recognized by hardware vendors. It is commonly exploited to improve the performance of hardware. General Purpose Database Management Systems (DBMSs), whose main goal is to simplify optimal data storage and processing, generally fall short of this claim because the usage pattern of the stored data cannot be anticipated when designing the system. The current interest in column oriented databases indicates that one strategy does not fit all applications. A DBMS that automatically adapts it’s storage strategy to the workload of the database promises a significant performance increase by maximizing the benefit of hardware optimizations that are based on the principle of data locality. This thesis gives an overview of optimizations that are based on the principle of data locality and the effect they have on the data access performance of applications. Based on the findings, a model is introduced that allows an estimation of the costs of data accesses based on the arrangement of the data in the main memory. This model is evaluated through a series of experiments and incorporated into an automatic layouting component for a DBMS. This layouting component allows the calculation of an analytically optimal storage layout. The performance benefits brought by this component are evaluated in an application benchmark.

[1]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[2]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[3]  Bingsheng He,et al.  EaseDB: a cache-oblivious in-memory query processor , 2007, SIGMOD '07.

[4]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[5]  Prashant Palvia,et al.  Approximating Block Accesses in Database Organizations , 1984, Inf. Process. Lett..

[6]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[7]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[8]  Frederick S. Hillier,et al.  Introduction of Operations Research , 1967 .

[9]  Hector Garcia-Molina,et al.  Shrinking the warehouse update Window , 1999, SIGMOD '99.

[10]  Martin L. Kersten,et al.  Generic Database Cost Models for Hierarchical Memory Systems , 2002, VLDB.

[11]  Elke A. Rundensteiner,et al.  A cost model for estimating the performance of spatial joins using R-trees , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[12]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[13]  ザガー、ポール・エス,et al.  Burst edo memory device address counter , 1995 .

[14]  Shang-Hua Teng,et al.  Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.

[15]  Wei-Fen Lin,et al.  Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[16]  Michael J. Carey,et al.  A Study of Index Structures for a Main Memory Database Management System , 1986, HPTS.

[17]  Michael J. Carey,et al.  Query processing in main memory database management systems , 1986, SIGMOD '86.

[18]  George Diehr,et al.  Estimating Block Accesses in Database Organizations , 1994, IEEE Trans. Knowl. Data Eng..

[19]  Ravi Krishnamurthy,et al.  Query optimization in a memory-resident domain relational calculus database system , 1990, TODS.

[20]  Cyril S. Ku,et al.  Design Patterns , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[21]  J. Clausen,et al.  Branch and Bound Algorithms-Principles and Examples , 2003 .

[22]  Antoni Wolski,et al.  Lazy Aggregates for Real-Time OLAP , 1999, DaWaK.

[23]  Shamkant B. Navathe,et al.  Vertical partitioning for database design: a graphical algorithm , 1989, SIGMOD '89.

[24]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[25]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[26]  M. Jarke,et al.  Fundamentals of Data Warehouses , 2003, Springer Berlin Heidelberg.

[27]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[28]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, STOC '84.

[29]  Ben Taskar,et al.  Selectivity estimation using probabilistic models , 2001, SIGMOD '01.

[30]  To-Yat Cheung Estimating block accesses and number of records in file management , 1982, CACM.

[31]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[32]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[33]  Marcin Zukowski,et al.  Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS , 2007, VLDB.

[34]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[35]  Alfonso F. Cardenas Analysis and performance of inverted data base structures , 1975, CACM.

[36]  Marcin Zukowski,et al.  MonetDB/X100 - A DBMS In The CPU Cache , 2005, IEEE Data Eng. Bull..

[37]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[38]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[39]  Hasso Plattner,et al.  A common database approach for OLTP and OLAP using an in-memory column database , 2009, SIGMOD Conference.

[40]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[41]  Trevor N. Mudge,et al.  A performance comparison of contemporary DRAM architectures , 1999, ISCA.

[42]  Wesley W. Chu,et al.  A Transaction-Based Approach to Vertical Partitioning for Relational Database Systems , 1993, IEEE Trans. Software Eng..

[43]  David J. DeWitt,et al.  Materialization Strategies in a Column-Oriented DBMS , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[44]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[45]  Michael Stonebraker,et al.  The Design of the POSTGRES Storage System , 1988, VLDB.

[46]  Times-Ten Team,et al.  In-memory data management for consumer transactions the timesten approach , 1999, SIGMOD '99.

[47]  Dennis G. Severance,et al.  The use of cluster analysis in physical data base design , 1975, VLDB '75.

[48]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[49]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[50]  kc claffy,et al.  Bandwidth estimation: metrics, measurement techniques, and tools , 2003, IEEE Netw..

[51]  Jeffrey Alan Hoffer A clustering approach to the generation of subfiles for the design of a computer data base. , 1975 .

[52]  David J. DeWitt,et al.  Data page layouts for relational databases on deep memory hierarchies , 2002, The VLDB Journal.

[53]  Jignesh M. Patel,et al.  Data Morphing: An Adaptive, Cache-Conscious Storage Technique , 2003, VLDB.

[54]  Alexander Zeier,et al.  HYRISE - A Main Memory Hybrid Storage Engine , 2010, Proc. VLDB Endow..

[55]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .