High Performance Transactions in Deuteronomy

The Deuteronomy architecture provides a clean separation of transaction functionality (performed in a transaction component, or TC) from data management functionality (performed in a data component, or DC). In prior work we implemented both a TC and DC that achieved modest performance. We recently built a high performance DC (the Bw-tree key value store) that achieves very high performance on modern hardware and is currently shipping as an indexing and storage layer in a number of Microsoft systems. This new DC executes operations more than 100× faster than the TC we previously implemented. This paper describes how we achieved two orders of magnitude speedup in TC performance and shows that a full Deuteronomy stack can achieve very high performance overall. Importantly, the resulting full stack is a system that caches data residing on secondary storage while exhibiting performance on par with main memory systems. Our new prototype TC combined with the previously re-architected DC scales to effectively use 48 hardware threads on our 4 socket NUMA machine and commits more than 1.5 million transactions per second (6 million total operations per second) for a variety of workloads.

[1]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[2]  Jignesh M. Patel,et al.  High-Performance Concurrency Control Mechanisms for Main-Memory Databases , 2011, Proc. VLDB Endow..

[3]  Mohamed F. Mokbel,et al.  Deuteronomy: Transaction Support for Cloud Data , 2011, CIDR.

[4]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[5]  Maged M. Michael Scalable lock-free dynamic memory allocation , 2004, PLDI '04.

[6]  Michael Stonebraker,et al.  Rethinking main memory OLTP recovery , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[7]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[8]  Jan Lindström,et al.  IBM solidDB: In-Memory Database Optimized for Extreme Speed and Availability , 2013, IEEE Data Eng. Bull..

[9]  Stephen M. Rumble,et al.  Log-structured memory for DRAM-based storage , 2014, FAST.

[10]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[11]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[12]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[13]  Eddie Kohler,et al.  Fast Databases with Fast Durability and Recovery Through Multicore Parallelism , 2014, OSDI.

[14]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[15]  Alan Fekete,et al.  Multi-version Concurrency via Timestamp Range Conflict Management , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[16]  Irving L. Traiger,et al.  Granularity of locks in a shared data base , 1975, VLDB '75.

[17]  Sudipta Sengupta,et al.  LLAMA: A Cache/Storage Subsystem for Modern Hardware , 2013, Proc. VLDB Endow..

[18]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[19]  Alfons Kemper,et al.  An Evaluation of Strict Timestamp Ordering Concurrency Control for Main-Memory Database Systems , 2013, IMDM@VLDB.

[20]  Viktor Leis,et al.  Exploiting hardware transactional memory in main-memory databases , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[21]  Michael Stonebraker,et al.  The VoltDB Main Memory DBMS , 2013, IEEE Data Eng. Bull..

[22]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[23]  Gerhard Weikum,et al.  Unbundling Transaction Services in the Cloud , 2009, CIDR.

[24]  David P. Reed,et al.  Naming and synchronization in a decentralized computer system , 1978 .

[25]  Marie-Anne Neimat,et al.  Oracle TimesTen: An In-Memory Database for Enterprise Applications , 2013, IEEE Data Eng. Bull..

[26]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[27]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..