GridTables: A One-Size-Fits-Most H2TAP Data Store

Heterogeneous HybridTransactionalAnalytical Processing (H2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{H}^{2}$$\end{document}TAP) database systems have been developed to match the requirements for low latency analysis of real-time operational data. Due to technical challenges, these systems are hard to architect, non-trivial to engineer, and complex to administrate. Current research has proposed excellent solutions to many of those challenges in isolation – a unified engine enabling to optimize performance by combining these solutions is still missing. In this concept paper, we suggest a highly flexible and adaptive data structure (called gridtable) to physically organize sparse but structured records in the context of H2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{H}^{2}$$\end{document}TAP. For this, we focus on the design of an efficient highly-flexible storage layout that is built from scratch for mixed query workloads. The key challenges we address are: (1) partial storage in different memory locations, and (2) the ability to optimize for mixed OLTP-/OLAP access patterns. To guarantee safe and well-specified data definition or manipulation, as well as fast querying with no compromises on performance, we propose two dedicated access paths to the storage. In this paper, we explore the architecture and internals of gridtables showing design goals, concepts and trade-offs. We close this paper with open research questions and challenges that must be addressed in order to take advantage of the flexibility of our solution.

[1]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[2]  Surajit Chaudhuri,et al.  Table of Contents (pdf) , 2007, VLDB.

[3]  Gustavo Alonso,et al.  Streams on Wires - A Query Compiler for FPGAs , 2009, Proc. VLDB Endow..

[4]  Gunter Saake,et al.  Efficient co-processor utilization in database query processing , 2013, Inf. Syst..

[5]  David J. DeWitt,et al.  Data page layouts for relational databases on deep memory hierarchies , 2002, The VLDB Journal.

[6]  Anastasia Ailamaki,et al.  Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing , 2017, Proc. VLDB Endow..

[7]  Lin Ma,et al.  Self-Driving Database Management Systems , 2017, CIDR.

[8]  Bingsheng He,et al.  High-Throughput Transaction Executions on Graphics Processors , 2011, Proc. VLDB Endow..

[9]  Gunter Saake,et al.  Toward GPU Accelerated Data Stream Processing , 2015, GvD.

[10]  Martin L. Kersten,et al.  Updating a cracked database , 2007, SIGMOD '07.

[11]  Gunter Saake,et al.  Toward GPU-accelerated Database Optimization , 2015, Datenbank-Spektrum.

[12]  Gunter Saake,et al.  Efficient Evaluation of Multi-Column Selection Predicates in Main-Memory , 2019, IEEE Transactions on Knowledge and Data Engineering.

[13]  Andrew Pavlo,et al.  Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads , 2016, SIGMOD Conference.

[14]  Volker Markl,et al.  Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation , 2015, SIGMOD Conference.

[15]  Gunter Saake,et al.  Automated Vertical Partitioning with Deep Reinforcement Learning , 2019, ADBIS.

[16]  Alekh Jindal,et al.  The Uncracked Pieces in Database Cracking , 2013, Proc. VLDB Endow..

[17]  Alekh Jindal,et al.  An experimental evaluation and analysis of database cracking , 2015, The VLDB Journal.

[18]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.

[19]  Bernhard Seeger,et al.  ChronicleDB: A High-Performance Event Store , 2017, EDBT.

[20]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[21]  Jens Dittrich,et al.  AIR: Adaptive Index Replacement in Hadoop , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[22]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[23]  Jens Dittrich,et al.  On the Surprising Difficulty of Simple Things: the Case of Radix Partitioning , 2015, Proc. VLDB Endow..

[24]  Alexander Zeier,et al.  HYRISE - A Main Memory Hybrid Storage Engine , 2010, Proc. VLDB Endow..

[25]  Jens Dittrich,et al.  Main memory adaptive indexing for multi-core systems , 2014, DaMoN '14.

[26]  Gunter Saake,et al.  GridFormation: Towards Self-Driven Online Data Partitioning using Reinforcement Learning , 2018, aiDM@SIGMOD.

[27]  Andreas Kipf,et al.  Scalable Analytics on Fast Data , 2019, ACM Trans. Database Syst..

[28]  Yuanyuan Tian,et al.  Hybrid Transactional/Analytical Processing: A Survey , 2017, SIGMOD Conference.

[29]  David Li,et al.  Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn , 2019, CIDR.

[30]  Jignesh M. Patel,et al.  WideTable: An Accelerator for Analytical Data Processing , 2014, Proc. VLDB Endow..

[31]  Stratos Idreos,et al.  The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models , 2018, SIGMOD Conference.

[32]  Bernhard Seeger,et al.  Transactional support for adaptive indexing , 2013, The VLDB Journal.

[33]  Jens Dittrich,et al.  Adaptive Adaptive Indexing , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[34]  Xiaoyong Du,et al.  Wide Table Layout Optimization based on Column Ordering and Duplication , 2017, SIGMOD Conference.

[35]  Lin Ma,et al.  Query-based Workload Forecasting for Self-Driving Database Management Systems , 2018, SIGMOD Conference.

[36]  Lukasz Ziarek,et al.  Just-In-Time Data Structures , 2015, CIDR.

[37]  Constantin Pohl,et al.  Joins in a heterogeneous memory hierarchy: exploiting high-bandwidth memory , 2018, DaMoN.

[38]  Irena Holubová,et al.  Structural XML Query Processing , 2017, ACM Comput. Surv..

[39]  Alekh Jindal,et al.  Relax and Let the Database Do the Partitioning Online , 2011, BIRTE.

[40]  Bastian Hoßbach,et al.  Query Optimization in Heterogenous Event Processing Federations , 2015, Datenbank-Spektrum.

[41]  Anastasia Ailamaki,et al.  Designing Access Methods: The RUM Conjecture , 2016, EDBT.

[42]  Eleni Petraki,et al.  Holistic Indexing in Main-memory Column-stores , 2015, SIGMOD Conference.

[43]  Jürgen Teich,et al.  Integration of FPGAs in Database Management Systems: Challenges and Opportunities , 2018, Datenbank-Spektrum.

[44]  Anastasia Ailamaki,et al.  H2O: a hands-free adaptive store , 2014, SIGMOD Conference.

[45]  Bingsheng He,et al.  A distributed in-memory key-value store system on heterogeneous CPU–GPU cluster , 2017, The VLDB Journal.

[46]  Gunter Saake,et al.  Column vs. Row Stores for Data Manipulation in Hardware Oblivious CPU/GPU Database Systems , 2017, Grundlagen von Datenbanken.

[47]  Dirk Habich,et al.  Heterogeneous placement optimization for database query processing , 2017, it Inf. Technol..

[48]  Roland H. C. Yap,et al.  Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores , 2012, Proc. VLDB Endow..

[49]  Gunter Saake,et al.  Are Databases Fit for Hybrid Workloads on GPUs? A Storage Engine's Perspective , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[50]  Gunter Saake,et al.  Memory Management Strategies in CPU/GPU Database Systems: A Survey , 2018, BDAS.

[51]  Rüdiger Kapitza,et al.  STANlite – A Database Engine for Secure Data Processing at Rack-Scale Level , 2018, 2018 IEEE International Conference on Cloud Engineering (IC2E).

[52]  Danica Porobic,et al.  How to stop under-utilization and love multicores , 2014, 2015 IEEE 31st International Conference on Data Engineering.

[53]  Kai-Uwe Sattler,et al.  Kompressionstechniken für spaltenorientierte BI-Accelerator-Lösungen , 2009, BTW.

[54]  Tilmann Rabl,et al.  Generating custom code for efficient query execution on heterogeneous processors , 2017, The VLDB Journal.

[55]  Gunter Saake,et al.  Protobase: It's About Time for Backend/Database Co-Design , 2019, BTW.

[56]  Alfons Kemper,et al.  Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation , 2016, SIGMOD Conference.

[57]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[58]  Surajit Chaudhuri,et al.  Overview of Data Exploration Techniques , 2015, SIGMOD Conference.

[59]  Mustafa Canim,et al.  L-Store: A Real-time OLTP and OLAP System , 2016, EDBT.

[60]  Gunter Saake,et al.  Backlogs and Interval Timestamps: Building Blocks for Supporting Temporal Queries in Graph Databases , 2017, EDBT/ICDT Workshops.

[61]  Thomas Neumann,et al.  Adaptive Optimization of Very Large Join Queries , 2018, SIGMOD Conference.

[62]  Anastasia Ailamaki,et al.  The Case For Heterogeneous HTAP , 2017, CIDR.