Waste Not, Want Not! Managing relational data in asymmetric memories

In this thesis, we study the management of relational data in modern, i.e., asymmetric computer systems. We explore different strategies to identify asymmetries in persistent data, map them to asymmetries in the memory landscape and, eventually, exploit them to increase query processing performance. To this end, we study memory conscious decomposition and storage of data at different granularities: relations, vertical partitions, single attributes as well as individual bits. In the interest of conciseness, we exclude techniques that require auxilliary data structures such as indices or horizontal partitioning which come with significant maintenance overhead. Further, we argue that, when managing memory-resident data, the problem of optimal data placement is tightly connected to the efficiency of the query processing paradigm and can, therefore, not be studied in isolation. Consequently, we also investigate the connection between storage model and processing paradigm. In the case of decomposition at partition granularity we identify Just-in-Time compilation as the only viable query processing model. In the case of distribution at the granularity of individual bits, we develop a novel processing paradigm that efficiently exploits the asymmetries in the underlying data and memory components.

[1]  Bingsheng He,et al.  Relational query coprocessing on graphics processors , 2009, TODS.

[2]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[3]  Volker Markl,et al.  A First Step Towards GPU-assisted Query Optimization , 2012, ADMS@VLDB.

[4]  Andrew Trotman,et al.  Compressing Inverted Files , 2004, Information Retrieval.

[5]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.

[6]  Volker Markl,et al.  Hardware-Oblivious Parallelism for In-Memory Column-Stores , 2013, Proc. VLDB Endow..

[7]  Miron Livny,et al.  Priority in DBMS Resource Scheduling , 1989, VLDB.

[8]  Martin L. Kersten,et al.  SciBORQ: Scientific data management with Bounds On Runtime and Quality , 2011, CIDR.

[9]  Sridhar Ramaswamy,et al.  The Aqua approximate query answering system , 1999, SIGMOD '99.

[10]  Martin L. Kersten,et al.  Accelerating Foreign-Key Joins using Asymmetric Memory Channels , 2011, ADMS@VLDB.

[11]  Donald E. Knuth,et al.  Dynamic Huffman Coding , 1985, J. Algorithms.

[12]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[13]  George Diehr,et al.  Estimating Block Accesses in Database Organizations , 1994, IEEE Trans. Knowl. Data Eng..

[14]  Arthur H. Veen,et al.  Dataflow machine architecture , 1986, CSUR.

[15]  Martin L. Kersten,et al.  Object Storage Management in Goblin , 1992, IWDOM.

[16]  Ravi Krishnamurthy,et al.  Query optimization in a memory-resident domain relational calculus database system , 1990, TODS.

[17]  Joel H. Saltz,et al.  Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems , 2012, Proc. VLDB Endow..

[18]  Toby J. Teorey,et al.  A comparative analysis of disk scheduling policies , 1972, CACM.

[19]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[20]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[21]  Marcin Zukowski,et al.  From Cooperative Scans to Predictive Buffer Management , 2012, Proc. VLDB Endow..

[22]  F. T. Moore,et al.  Economies of Scale: Some Statistical Evidence , 1959 .

[23]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[24]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[25]  Irving L. Traiger,et al.  A history and evaluation of System R , 1981, CACM.

[26]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[27]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[28]  Ben Taskar,et al.  Selectivity estimation using probabilistic models , 2001, SIGMOD '01.

[29]  David Sidler Column Storage for FPGA-accelerated Data Analytics , 2013 .

[30]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[31]  Jignesh M. Patel,et al.  Data Morphing: An Adaptive, Cache-Conscious Storage Technique , 2003, VLDB.

[32]  Elke A. Rundensteiner,et al.  A cost model for estimating the performance of spatial joins using R-trees , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[33]  Goetz Graefe,et al.  The five-minute rule ten years later, and other computer storage rules of thumb , 1997, SGMD.

[34]  David K. Hsiao Data Base Machines are Coming, Data Base Machines are Coming! , 1979, Computer.

[35]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[36]  Dinesh Manocha,et al.  Fast and approximate stream mining of quantiles and frequencies using graphics processors , 2005, SIGMOD '05.

[37]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[38]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[39]  Gustavo Alonso,et al.  SharedDB: Killing One Thousand Queries With One Stone , 2012, Proc. VLDB Endow..

[40]  Minh-Duc Pham Self-organizing structured RDF in MonetDB , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[41]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[42]  Sven Helmer,et al.  The implementation and performance of compressed databases , 2000, SGMD.

[43]  T. Berger Rate-Distortion Theory , 2003 .

[44]  Andrea C. Arpaci-Dusseau,et al.  The architectural costs of streaming I/O: A comparison of workstations, clusters, and SMPs , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[45]  Thomas Neumann,et al.  Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..

[46]  René Beier,et al.  Scalable Generation of Synthetic GPS Traces with Real-Life Data Characteristics , 2012, TPCTC.

[47]  Marcin Zukowski,et al.  Vectorization vs. compilation in query execution , 2011, DaMoN '11.

[48]  Martin L. Kersten,et al.  Generic Database Cost Models for Hierarchical Memory Systems , 2002, VLDB.

[49]  David J. Lilja,et al.  A statistical evaluation of the impact of parameter selection on storage system benchmarks , 2011 .

[50]  Ken Kennedy,et al.  Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..

[51]  G. Cole Management Theory and Practice , 1979 .

[52]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[53]  Stratis Viglas,et al.  Generating code for holistic query evaluation , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[54]  Douglas C. Schmidt,et al.  An Object Behavioral Pattern for Demultiplexing and Dispatching Handlers for Asynchronous Events , 1998 .

[55]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[56]  Kevin Skadron,et al.  Accelerating SQL database operations on a GPU with CUDA , 2010, GPGPU-3.

[57]  Ryan Johnson,et al.  Row-wise parallel predicate evaluation , 2008, Proc. VLDB Endow..

[58]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[59]  Hamid Pirahesh,et al.  Compiled Query Execution Engine using JVM , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[60]  Ippokratis Pandis,et al.  CMU-CS-10-101 1 Data-Oriented Transaction Execution , 2010 .

[61]  Steven Swanson,et al.  The bleak future of NAND flash memory , 2012, FAST.

[62]  Goetz Graefe,et al.  Volcano - An Extensible and Parallel Query Evaluation System , 1994, IEEE Trans. Knowl. Data Eng..

[63]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[64]  Wolfgang Lehner,et al.  Designing Random Sample Synopses with Outliers , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[65]  Harumi A. Kuno,et al.  The mixed workload CH-benCHmark , 2011, DBTest '11.

[66]  Dedication , 2021, Psychosocial Aspects of Chronic Kidney Disease.

[67]  Bingsheng He,et al.  Relational joins on graphics processors , 2008, SIGMOD Conference.

[68]  Sam Lightstone,et al.  DB2 with BLU Acceleration: So Much More than Just a Column Store , 2013, Proc. VLDB Endow..

[69]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[70]  Mateo Valero,et al.  Vector architectures: past, present and future , 1998, ICS '98.

[71]  Bingsheng He,et al.  GPUQP: query co-processing using graphics processors , 2007, SIGMOD '07.

[72]  Anastasia Ailamaki,et al.  StagedDB: Designing Database Servers for Modern Hardware , 2005, IEEE Data Eng. Bull..

[73]  Ion Stoica,et al.  Blink and It's Done: Interactive Queries on Very Large Data , 2012, Proc. VLDB Endow..

[74]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[75]  Bingsheng He,et al.  Efficient gather and scatter operations on graphics processors , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[76]  Jim Gray,et al.  The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for CPU time , 1987, SIGMOD '87.

[77]  Michael J. Carey,et al.  A recovery algorithm for a high-performance memory-resident database system , 1987, SIGMOD '87.

[78]  David J. DeWitt,et al.  Database Machines: An Idea Whose Time Passed? A Critique of the Future of Database Machines , 1989, IWDM.

[79]  Alexander Zeier,et al.  HYRISE - A Main Memory Hybrid Storage Engine , 2010, Proc. VLDB Endow..

[80]  Vikram S. Adve,et al.  The LLVM Instruction Set and Compilation Strategy , 2002 .

[81]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[82]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[83]  Marcin Zukowski,et al.  Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS , 2007, VLDB.

[84]  Mark S. Boddy,et al.  An Analysis of Time-Dependent Planning , 1988, AAAI.

[85]  Alfonso F. Cardenas Analysis and performance of inverted data base structures , 1975, CACM.

[86]  Marcin Zukowski,et al.  MonetDB/X100 - A DBMS In The CPU Cache , 2005, IEEE Data Eng. Bull..

[87]  Michael Garland,et al.  Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[88]  Holger Pirk,et al.  Cache Conscious Data Layouting for In-Memory Databases , 2010 .

[89]  Goetz Graefe,et al.  The five-minute rule twenty years later, and how flash memory changes the rules , 2007, DaMoN '07.

[90]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[91]  Jignesh M. Patel,et al.  Design and evaluation of main memory hash join algorithms for multi-core CPUs , 2011, SIGMOD '11.

[92]  Ioana Manolescu,et al.  Performance Evaluation and Experimental Assessment - Conscience or Curse of Database Research? , 2007, VLDB.

[93]  Kenneth A. Ross,et al.  Adaptive Aggregation on Chip Multiprocessors , 2007, VLDB.

[94]  Milos Nikolic,et al.  DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views , 2012, Proc. VLDB Endow..

[95]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[96]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[97]  Viktor Kuncak,et al.  Automatic synthesis of out-of-core algorithms , 2013, SIGMOD '13.

[98]  Prashant Palvia,et al.  Approximating Block Accesses in Database Organizations , 1984, Inf. Process. Lett..

[99]  ザガー、ポール・エス,et al.  Burst edo memory device address counter , 1995 .

[100]  Andreas Reuter,et al.  Performance analysis of recovery techniques , 1984, TODS.

[101]  Kim M. Hazelwood,et al.  Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[102]  Ulf Assarsson,et al.  Fast parallel GPU-sorting using a hybrid algorithm , 2008, J. Parallel Distributed Comput..

[103]  Jae-Gil Lee,et al.  Blink: Not Your Father's Database! , 2011, BIRTE.

[104]  Jignesh M. Patel,et al.  BitWeaving: fast scans for main memory data processing , 2013, SIGMOD '13.

[105]  To-Yat Cheung Estimating block accesses and number of records in file management , 1982, CACM.

[106]  Cagri Balkesen In-memory parallel join processing on multi-core processors , 2014 .

[107]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[108]  Bingsheng He,et al.  Database compression on graphics processors , 2010, Proc. VLDB Endow..

[109]  Surajit Chaudhuri,et al.  Table of Contents (pdf) , 2007, VLDB.

[110]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[111]  P. Glaskowsky NVIDIA ’ s Fermi : The First Complete GPU Computing Architecture , 2009 .

[112]  Alfons Kemper,et al.  One Size Fits all, Again! The Architecture of the Hybrid OLTP&OLAP Database Management System HyPer , 2010, BIRTE.

[113]  Randima Fernando,et al.  GPU Gems: Programming Techniques, Tips and Tricks for Real-Time Graphics , 2004 .

[114]  Hasso Plattner,et al.  A common database approach for OLTP and OLAP using an in-memory column database , 2009, SIGMOD Conference.

[115]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[116]  Don S. Batory Concepts for a database system compiler , 1988, PODS '88.

[117]  Dinesh Manocha,et al.  Fast computation of database operations using graphics processors , 2004, SIGMOD '04.