Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems

The advent of non-volatile memory (NVM) will fundamentally change the dichotomy between memory and durable storage in database management systems (DBMSs). These new NVM devices are almost as fast as DRAM, but all writes to it are potentially persistent even after power loss. Existing DBMSs are unable to take full advantage of this technology because their internal architectures are predicated on the assumption that memory is volatile. With NVM, many of the components of legacy DBMSs are unnecessary and will degrade the performance of data intensive applications. To better understand these issues, we implemented three engines in a modular DBMS testbed that are based on different storage management architectures: (1) in-place updates, (2) copy-on-write updates, and (3) log-structured updates. We then present NVM-aware variants of these architectures that leverage the persistence and byte-addressability properties of NVM in their storage and recovery methods. Our experimental evaluation on an NVM hardware emulator shows that these engines achieve up to 5.5X higher throughput than their traditional counterparts while reducing the amount of wear due to write operations by up to 2X. We also demonstrate that our NVM-aware recovery protocols allow these engines to recover almost instantaneously after the DBMS restarts.

[1]  Jr. Allen B. Tucker,et al.  The Computer Science and Engineering Handbook , 1997 .

[2]  D. Stewart,et al.  The missing memristor found , 2008, Nature.

[3]  Timothy J. Slegel,et al.  Transactional Memory Architecture and Implementation for IBM System Z , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[4]  Hasso Plattner,et al.  nvm malloc: Memory Allocation for NVRAM , 2015, ADMS@VLDB.

[5]  Philippas Tsigas,et al.  Lock-free deques and doubly linked lists , 2008, J. Parallel Distributed Comput..

[6]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[7]  Michael Stonebraker,et al.  A Prolegomenon on OLTP Database Systems for Non-Volatile Memory , 2014, ADMS@VLDB.

[8]  M. Hosomi,et al.  A novel nonvolatile memory with spin torque transfer magnetization switching: spin-ram , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[9]  Michael Stonebraker,et al.  Rethinking main memory OLTP recovery , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[10]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data objects , 1993, TOPL.

[11]  Ismail Oukid,et al.  SOFORT: a hybrid SCM-DRAM storage engine for fast data recovery , 2014, DaMoN '14.

[12]  Robert H. Dennard,et al.  Challenges and future directions for the scaling of dynamic random-access memory (DRAM) , 2002, IBM J. Res. Dev..

[13]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[14]  Babak Falsafi,et al.  Shore-MT: a scalable storage manager for the multicore era , 2009, EDBT '09.

[15]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[16]  Matthias S. Müller,et al.  Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[17]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[18]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[19]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[20]  Irving L. Traiger,et al.  The Recovery Manager of the System R Database Manager , 1981, CSUR.

[21]  Ismail Oukid,et al.  Instant Recovery for Main Memory Databases , 2015, CIDR.

[22]  Shih-Hung Chen,et al.  Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..

[23]  Carlo Curino,et al.  Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems , 2012, SIGMOD Conference.

[24]  Ismail Oukid,et al.  FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory , 2016, SIGMOD Conference.

[25]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[26]  PavloAndrew,et al.  An empirical evaluation of in-memory multi-version concurrency control , 2017, VLDB 2017.

[27]  Qin Jin,et al.  Persistent B+-Trees in Non-Volatile Main Memory , 2015, Proc. VLDB Endow..

[28]  Mary Baker,et al.  Non-volatile memory for fast, reliable file systems , 1992, ASPLOS V.

[29]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[30]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[31]  Parthasarathy Ranganathan,et al.  Consistent, durable, and safe memory management for byte-addressable non volatile main memory , 2013, TRIOS@SOSP.

[32]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[33]  Ravi Krishnamurthy,et al.  The Case For Safe RAM , 1989, VLDB.

[34]  Kailash Gopalakrishnan,et al.  Overview of candidate device technologies for storage-class memory , 2008, IBM J. Res. Dev..

[35]  Michael Stonebraker,et al.  Anti-Caching: A New Approach to Database Management System Architecture , 2013, Proc. VLDB Endow..

[36]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[37]  Rajesh Gupta,et al.  From ARIES to MARS: transaction support for next-generation, solid-state drives , 2013, SOSP.

[38]  Trevor N. Mudge,et al.  A limits study of benefits from nanostore-based future data-centric system architectures , 2012, CF '12.

[39]  Philippe Bonnet,et al.  The Necessary Death of the Block Device Interface , 2013, CIDR.

[40]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[41]  Viktor Leis,et al.  The ART of practical synchronization , 2016, DaMoN '16.

[42]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[43]  Jignesh M. Patel,et al.  Effect of node size on the performance of cache-conscious B+-trees , 2003, SIGMETRICS '03.

[44]  Karthick Rajamani,et al.  Energy Management for Commercial Servers , 2003, Computer.

[45]  Keir Fraser,et al.  Practical lock-freedom , 2003 .

[46]  Scott Hahn,et al.  A protected block device for Persistent Memory , 2014, 2014 30th Symposium on Mass Storage Systems and Technologies (MSST).

[47]  Sudipta Sengupta,et al.  High Performance Transactions in Deuteronomy , 2015, CIDR.

[48]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[49]  C. Mohan,et al.  High performance database logging using storage class memory , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[50]  James H. Anderson,et al.  Implementing wait-free objects on priority-based systems , 1997, PODC '97.

[51]  OHAD RODEH,et al.  B-trees, shadowing, and clones , 2008, TOS.

[52]  Keir Fraser,et al.  A Practical Multi-word Compare-and-Swap Operation , 2002, DISC.

[53]  Lin Ma,et al.  Self-Driving Database Management Systems , 2017, CIDR.

[54]  Michael J. Franklin,et al.  Concurrency Control and Recovery , 2014, Encyclopedia of Database Systems.

[55]  Ryan Stutsman,et al.  To Lock, Swap, or Elide: On the Interplay of Hardware Transactional Memory and Lock-Free Indexing , 2015, Proc. VLDB Endow..

[56]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[57]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[58]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[59]  Irving L. Traiger,et al.  System R: relational approach to database management , 1976, TODS.

[60]  Michael B. Greenwald,et al.  Two-handed emulation: how to build non-blocking implementations of complex data-structures using DCAS , 2002, PODC '02.

[61]  Amos Israeli,et al.  Disjoint-access-parallel implementations of strong shared memory primitives , 1994, PODC '94.

[62]  Hyojun Kim,et al.  Evaluating Phase Change Memory for Enterprise Storage Systems: A Study of Caching and Tiering Approaches , 2014, TOS.

[63]  Ryan Johnson,et al.  Scalable Logging through Emerging Non-Volatile Memory , 2014, Proc. VLDB Endow..

[64]  Viktor Leis,et al.  The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[65]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[66]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[67]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[68]  Jianliang Xu,et al.  PCMLogging: reducing transaction logging overhead with PCM , 2011, CIKM '11.

[69]  Suman Nath,et al.  Rethinking Database Algorithms for Phase Change Memory , 2011, CIDR.

[70]  Thomas F. Wenisch,et al.  Storage Management in the NVRAM Era , 2013, Proc. VLDB Endow..