Aether: A Scalable Approach to Logging

The shift to multi-core hardware brings new challenges to database systems, as the software parallelism determines performance. Even though database systems traditionally accommodate simultaneous requests, a multitude of synchronization barriers serialize execution. Write-ahead logging is a fundamental, omnipresent component in ARIES-style concurrency and recovery, and one of the most important yet-to-be addressed potential bottlenecks, especially in OLTP workloads making frequent small changes to data. In this paper, we identify four logging-related impediments to database system scalability. Each issue challenges different level in the software architecture: (a) the high volume of small-sized I/O requests may saturate the disk, (b) transactions hold locks while waiting for the log flush, (c) extensive context switching overwhelms the OS scheduler with threads executing log I/Os, and (d) contention appears as transactions serialize accesses to in-memory log data structures. We demonstrate these problems and address them with techniques that, when combined, comprise a holistic, scalable approach to logging. Our solution achieves a 20%-69% speedup over a modern database system when running log-intensive workloads, such as the TPC-B and TATP benchmarks. Moreover, it achieves log insert throughput over 1.8GB/s for small log records on a single socket server, an order of magnitude higher than the traditional way of accessing the log using a single mutex.

[1]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[2]  William E. Weihl,et al.  What Good are Concurrent Search Structure Algorithms for databases Anyway? , 1985, IEEE Database Eng. Bull..

[3]  Andreas Reuter,et al.  Group Commit Timers and High Volume Transaction Systems , 1987, HPTS.

[4]  Abbas Rafii,et al.  Performance Tradeoffs of Group Commit Logging , 1989, Int. CMG Conference.

[5]  C. Mohan,et al.  ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes , 1990, VLDB.

[6]  Peter M. Spiro How the Rdb � VMS Data Sharing System Became Fast , 1992 .

[7]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[8]  David J. DeWitt,et al.  Shoring up persistent applications , 1994, SIGMOD '94.

[9]  Eljas Soisalon-Soininen,et al.  Partial Strictness in Two-Phase Locking , 1995, ICDT.

[10]  Nir Shavit,et al.  Elimination Trees and the Construction of Pools and Stacks , 1997, Theory of Computing Systems.

[11]  Y. Oyama,et al.  EXECUTING PARALLEL PROGRAMS WITH SYNCHRONIZATION BOTTLENECKS EFFICIENTLY , 1999 .

[12]  Sashikanth Chandrasekaran,et al.  Cache Fusion: Extending Shared-Disk Clusters with Shared Caches , 2001, VLDB.

[13]  Michael L. Scott,et al.  Non-blocking timeout in scalable queue-based spin locks , 2002, PODC '02.

[14]  David B. Lomet Recovery for Shared Disk Systems Using Multiple Redo Logs , 2002 .

[15]  Babak Falsafi,et al.  Database Servers on Chip Multiprocessors: Limitations and Opportunities , 2007, CIDR.

[16]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[17]  Pat Helland,et al.  Life beyond Distributed Transactions: an Apostate's Opinion , 2007, CIDR.

[18]  Jae-Myung Kim,et al.  A case for flash memory ssd in enterprise database applications , 2008, SIGMOD Conference.

[19]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[20]  Philippe Bonnet,et al.  uFLIP: Understanding Flash IO Patterns , 2009, CIDR.

[21]  Shimin Chen,et al.  FlashLogging: exploiting flash devices for synchronous logging performance , 2009, SIGMOD Conference.

[22]  Babak Falsafi,et al.  Shore-MT: a scalable storage manager for the multicore era , 2009, EDBT '09.

[23]  Ippokratis Pandis,et al.  Improving OLTP Scalability using Speculative Lock Inheritance , 2009, Proc. VLDB Endow..