Rethinking main memory OLTP recovery

Fine-grained, record-oriented write-ahead logging, as exemplified by systems like ARIES, has been the gold standard for relational database recovery. In this paper, we show that in modern high-throughput transaction processing systems, this is no longer the optimal way to recover a database system. In particular, as transaction throughputs get higher, ARIES-style logging starts to represent a non-trivial fraction of the overall transaction execution time. We propose a lighter weight, coarse-grained command logging technique which only records the transactions that were executed on the database. It then does recovery by starting from a transactionally consistent checkpoint and replaying the commands in the log as if they were new transactions. By avoiding the overhead of fine-grained logging of before and after images (both CPU complexity as well as substantial associated 110), command logging can yield significantly higher throughput at run-time. Recovery times for command logging are higher compared to an ARIEs-style physiological logging approach, but with the advent of high-availability techniques that can mask the outage of a recovering node, recovery speeds have become secondary in importance to run-time performance for most applications. We evaluated our approach on an implementation of TPCC in a main memory database system (VoltDB), and found that command logging can offer 1.5 x higher throughput than a main-memory optimized implementation of ARIEs-style physiological logging.

[1]  Ippokratis Pandis,et al.  Data-oriented transaction execution , 2010, Proc. VLDB Endow..

[2]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[3]  Slawomir Pilarski,et al.  Checkpointing for Distributed Databases: Starting from the Basics , 1992, IEEE Trans. Parallel Distributed Syst..

[4]  K. M. Chandy,et al.  Incremental Recovery In Main Memory Database Systems , 1992 .

[5]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[6]  Ippokratis Pandis,et al.  PLP: Page Latch-free Shared-everything OLTP , 2011, Proc. VLDB Endow..

[7]  Michael J. Cahill Serializable isolation for snapshot databases , 2009, TODS.

[8]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[9]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[10]  Michael J. Carey,et al.  A Concurrency Control Algorithm for Memory-Resident Database Systems , 1989, FODO.

[11]  David B. Lomet,et al.  Implementing Performance Competitive Logical Recovery , 2011, Proc. VLDB Endow..

[12]  Ippokratis Pandis,et al.  Aether: A Scalable Approach to Logging , 2010, Proc. VLDB Endow..

[13]  Carlo Curino,et al.  Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems , 2012, SIGMOD Conference.

[14]  S. Sudarshan,et al.  Dalí: A High Performance Main Memory Storage Manager , 1994, VLDB.

[15]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[16]  Jack A. Orenstein,et al.  The ObjectStore database system , 1991, CACM.

[17]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[18]  Xi Li,et al.  Post-crash log processing for fuzzy checkpointing main memory databases , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[19]  S. Sudarshan,et al.  Recovering from Main-Memory Lapses , 1993, VLDB.

[20]  Jun-Lin Lin,et al.  Segmented fuzzy checkpointing for main memory databases , 1996, SAC '96.

[21]  Margaret H. Dunham Main Memory Database Recovery , 1986, FJCC.

[22]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[23]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[24]  Johannes Gehrke,et al.  Fast checkpoint recovery algorithms for frequently consistent applications , 2011, SIGMOD '11.

[25]  Michael J. Carey,et al.  A recovery algorithm for a high-performance memory-resident database system , 1987, SIGMOD '87.

[26]  Daniel J. Abadi,et al.  Low overhead concurrency control for partitioned main memory databases , 2010, SIGMOD Conference.