Persistent B+-Trees in Non-Volatile Main Memory

Computer systems in the near future are expected to have N on- V olatile M ain M emory (NVMM), enabled by a new generation of N on- V olatile M emory (NVM) technologies, such as Phase Change Memory (PCM), STT-MRAM, and Memristor. The non-volatility property has the promise to persist in-memory data structures for instantaneous failure recovery. However, realizing such promise requires a careful design to ensure that in-memory data structures are in known consistent states after failures. This paper studies persistent in-memory B+-Trees as B+-Trees are widely used in database and data-intensive systems. While traditional techniques, such as undo-redo logging and shadowing, support persistent B+-Trees, we find that they incur drastic performance overhead because of extensive NVM writes and CPU cache flush operations. PCM-friendly B+-Trees with unsorted leaf nodes help mediate this issue, but the remaining overhead is still large. In this paper, we propose write atomic B+-Trees (wB+-Trees), a new type of main-memory B+-Trees, that aim to reduce such overhead as much as possible. wB+-Tree nodes employ a small indirect slot array and/or a bitmap so that most insertions and deletions do not require the movement of index entries. In this way, wB+-Trees can achieve node consistency either through atomic writes in the nodes or by redo-only logging. We model fast NVM using DRAM on a real machine and model PCM using a cycle-accurate simulator. Experimental results show that compared with previous persistent B+-Tree solutions, wB+-Trees achieve up to 8.8x speedups on DRAM-like fast NVM and up to 27.1x speedups on PCM for insertions and deletions while maintaining good search performance. Moreover, we replaced Memcached's internal hash index with tree indices. Our real machine Memcached experiments show that wB+-Trees achieve up to 3.8X improvements over previous persistent tree structures with undo-redo logging or shadowing.

[1]  Jae-Gil Lee,et al.  Blink: Not Your Father's Database! , 2011, BIRTE.

[2]  David Eisenstat,et al.  Lowering the Overhead of Nonblocking Software Transactional Memory , 2006 .

[3]  K. Gopalakrishnan,et al.  Phase change memory technology , 2010, 1001.1164.

[4]  Suman Nath,et al.  Rethinking Database Algorithms for Phase Change Memory , 2011, CIDR.

[5]  Thomas F. Wenisch,et al.  Storage Management in the NVRAM Era , 2013, Proc. VLDB Endow..

[6]  Kailash Gopalakrishnan,et al.  Overview of candidate device technologies for storage-class memory , 2008, IBM J. Res. Dev..

[7]  Peter M. Chen,et al.  Integrating reliable memory in databases , 1998, The VLDB Journal.

[8]  Orion Hodson,et al.  Whole-system persistence , 2012, ASPLOS XVII.

[9]  Jignesh M. Patel,et al.  Effect of node size on the performance of cache-conscious B+-trees , 2003, SIGMETRICS '03.

[10]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[11]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[12]  Hasso Plattner,et al.  The Impact of Columnar In-Memory Databases on Enterprise Systems , 2014, Proc. VLDB Endow..

[13]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[14]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[15]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[16]  Peter M. Chen,et al.  Free transactions with Rio Vista , 1997, SOSP.

[17]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[18]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[19]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[20]  A. L. Narasimha Reddy,et al.  SCMFS: A file system for Storage Class Memory , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[21]  R. Stanley Williams,et al.  Memristive devices in computing system: Promises and challenges , 2013, JETC.

[22]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[23]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[24]  H. V. Jagadish,et al.  Recovery Algorithms for Database Machines with Nonvolatile Main Memory , 1989, IWDM.

[25]  Parag Agrawal,et al.  The case for RAMCloud , 2011, Commun. ACM.

[26]  Michael Wu,et al.  eNVy: a non-volatile, main memory storage system , 1994, ASPLOS VI.

[27]  Stratis Viglas,et al.  Write-limited sorts and joins for persistent memory , 2014, Proc. VLDB Endow..

[28]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[29]  Xueti Tang,et al.  Spin-transfer torque magnetic random access memory (STT-MRAM) , 2013, JETC.

[30]  Todd C. Mowry,et al.  Improving index performance through prefetching , 2001, SIGMOD '01.

[31]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .