Accelerating Analytical Processing in MVCC using Fine-Granular High-Frequency Virtual Snapshotting

Efficient transaction management is a delicate task. As systems face transactions of inherently different types, ranging from point updates to long-running analytical queries, it is hard to satisfy their requirements with a single execution engine. Unfortunately, most systems rely on such a design that implements its parallelism using multi-version concurrency control. While MVCC parallelizes short-running OLTP transactions well, it struggles in the presence of mixed workloads containing long-running OLAP queries, as scans have to work their way through vast amounts of versioned data. To overcome this problem, we reintroduce the concept of hybrid processing and combine it with state-of-the-art MVCC: OLAP queries are outsourced to run on separate virtual snapshots while OLTP transactions run on the most recent version of the database. Inside both execution engines, we still apply MVCC. The most significant challenge of a hybrid approach is to generate the snapshots at a high frequency. Previous approaches heavily suffered from the high cost of snapshot creation. In our approach termed AnKer, we follow the current trend of co-designing underlying system components and the DBMS, to overcome the restrictions of the OS by introducing a custom system call vm_snapshot. It allows fine-granular snapshot creation that is orders of magnitudes faster than state-of-the-art approaches. Our experimental evaluation on an HTAP workload based on TPC-C transactions and OLAP queries show that our snapshotting mechanism is more than a factor of 100x faster than fork-based snapshotting and that the latency of OLAP queries is up to a factor of 4x lower than MVCC in a single execution engine. Besides, our approach enables a higher OLTP throughput than all state-of-the-art methods.

[1]  Ippokratis Pandis,et al.  ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads , 2016, SIGMOD Conference.

[2]  Hideaki Kimura,et al.  Mostly-Optimistic Concurrency Control for Highly Contended Dynamic Workloads on a Thousand Cores , 2016, Proc. VLDB Endow..

[3]  Shan Wang,et al.  SwingDB: An Embedded In-memory DBMS Enabling Instant Snapshot Sharing , 2016, ADMS/IMDM@VLDB.

[4]  Dan R. K. Ports,et al.  Serializable Snapshot Isolation in PostgreSQL , 2012, Proc. VLDB Endow..

[5]  Wolfgang Lehner,et al.  SAP HANA database: data management for modern business applications , 2012, SGMD.

[6]  Hamid Pirahesh,et al.  Efficient and flexible methods for transient versioning of records to avoid locking by read-only transactions , 1992, SIGMOD '92.

[7]  Jignesh M. Patel,et al.  High-Performance Concurrency Control Mechanisms for Main-Memory Databases , 2011, Proc. VLDB Endow..

[8]  Alfons Kemper,et al.  How to efficiently snapshot transactional data: hardware or software controlled? , 2011, DaMoN '11.

[9]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[10]  Gerhard Weikum,et al.  CHAPTER FOUR – Concurrency Control Algorithms , 2002 .

[11]  Gerhard Fettweis,et al.  Overview on Hardware Optimizations for Database Engines , 2017, BTW.

[12]  Alfons Kemper,et al.  Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems , 2015, SIGMOD Conference.

[13]  Gustavo Alonso,et al.  Centaur: A Framework for Hybrid CPU-FPGA Databases , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[14]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[15]  Jens Dittrich,et al.  RUMA has it: Rewired User-space Memory Access is Possible! , 2016, Proc. VLDB Endow..

[16]  Marc H. Scholl,et al.  Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery , 2001, SGMD.

[17]  Gustavo Alonso,et al.  Customized OS support for data-processing , 2016, DaMoN '16.

[18]  Ippokratis Pandis,et al.  Efficiently making (almost) any concurrency control mechanism serializable , 2016, The VLDB Journal.

[19]  Gottfried Vossen,et al.  Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery , 2002 .

[20]  Andrew Pavlo,et al.  An Empirical Evaluation of In-Memory Multi-Version Concurrency Control , 2017, Proc. VLDB Endow..

[21]  Patrick E. O'Neil,et al.  A read-only transaction anomaly under snapshot isolation , 2004, SGMD.

[22]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[23]  Michael Stonebraker,et al.  Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores , 2014, Proc. VLDB Endow..