Scalable Logging through Emerging Non-Volatile Memory

Emerging byte-addressable, non-volatile memory (NVM) is fundamentally changing the design principle of transaction logging. It potentially invalidates the need for flush-before-commit as log records are persistent immediately upon write. Distributed logging---a once prohibitive technique for single node systems in the DRAM era---becomes a promising solution to easing the logging bottleneck because of the non-volatility and high performance of NVM. In this paper, we advocate NVM and distributed logging on multicore and multi-socket hardware. We identify the challenges brought by distributed logging and discuss solutions. To protect committed work in NVM-based systems, we propose passive group commit, a lightweight, practical approach that leverages existing hardware and group commit. We expect that durable processor cache is the ultimate solution to protecting committed work and building reliable, scalable NVM-based systems in general. We evaluate distributed logging with logging-intensive workloads and show that distributed logging can achieve as much as ~3x speedup over centralized logging in a modern DBMS and that passive group commit only induces minuscule overhead.

[1]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  GraefeGoetz A survey of B-tree logging and recovery techniques , 2012 .

[3]  Orion Hodson,et al.  Whole-system persistence , 2012, ASPLOS XVII.

[4]  Dhruva R. Chakrabarti,et al.  Implications of CPU Caching on Byte-addressable Non-Volatile Memory Programming , 2012 .

[5]  Suman Nath,et al.  Rethinking Database Algorithms for Phase Change Memory , 2011, CIDR.

[6]  Thomas F. Wenisch,et al.  Storage Management in the NVRAM Era , 2013, Proc. VLDB Endow..

[7]  Shimin Chen,et al.  FlashLogging: exploiting flash devices for synchronous logging performance , 2009, SIGMOD Conference.

[8]  M. Hosomi,et al.  A novel nonvolatile memory with spin torque transfer magnetization switching: spin-ram , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[9]  C. Mohan,et al.  High performance database logging using storage class memory , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[10]  Jianliang Xu,et al.  Flag Commit: Supporting Efficient Transaction Recovery in Flash-Based DBMSs , 2012, IEEE Transactions on Knowledge and Data Engineering.

[11]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[12]  Babak Falsafi,et al.  Shore-MT: a scalable storage manager for the multicore era , 2009, EDBT '09.

[13]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[14]  Chita R. Das,et al.  Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs , 2012, DAC Design Automation Conference 2012.

[15]  Markus Kirchberg,et al.  C-ARIES: A Multi-threaded Version of the ARIES Recovery Algorithm , 2007, DEXA.

[16]  Dudley Allen Buck,et al.  Ferroelectrics for Digital Information Storage and Switching , 1952 .

[17]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]  Takayuki Kawahara,et al.  Scalable Spin-Transfer Torque RAM Technology for Normally-Off Computing , 2011, IEEE Design & Test of Computers.

[19]  Yuan Xie,et al.  Kiln: Closing the performance gap between systems with and without persistence support , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[21]  Ippokratis Pandis,et al.  PLP: Page Latch-free Shared-everything OLTP , 2011, Proc. VLDB Endow..

[22]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[23]  Ippokratis Pandis,et al.  Data-oriented transaction execution , 2010, Proc. VLDB Endow..

[24]  Peter M. Spiro How the Rdb � VMS Data Sharing System Became Fast , 1992 .

[25]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[26]  Jun Yang,et al.  Improving write operations in MLC phase change memory , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[27]  Ippokratis Pandis,et al.  Improving OLTP Scalability using Speculative Lock Inheritance , 2009, Proc. VLDB Endow..

[28]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[29]  Ippokratis Pandis,et al.  Scalability of write-ahead logging on multicore and multisocket hardware , 2012, The VLDB Journal.

[30]  M. Breitwisch Phase Change Memory , 2008, 2008 International Interconnect Technology Conference.

[31]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[32]  Goetz Graefe,et al.  A survey of B-tree logging and recovery techniques , 2012, ACM Trans. Database Syst..

[33]  Dmitri B. Strukov,et al.  Memristors for neural branch prediction: a case study in strict latency and write endurance challenges , 2013, CF '13.

[34]  Rajesh Gupta,et al.  From ARIES to MARS: transaction support for next-generation, solid-state drives , 2013, SOSP.

[35]  Arun Jagatheesan,et al.  Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[36]  Ippokratis Pandis,et al.  Aether: A Scalable Approach to Logging , 2010, Proc. VLDB Endow..

[37]  Jianliang Xu,et al.  PCMLogging: reducing transaction logging overhead with PCM , 2011, CIKM '11.

[38]  D. Stewart,et al.  The missing memristor found , 2008, Nature.