Easy Lock-Free Indexing in Non-Volatile Memory

Large non-volatile memories (NVRAM) will change the durability and recovery mechanisms of main-memory database systems. Today, these systems make operations durable through logging and checkpointing to secondary storage, and recover by rebuilding the in-memory database (records and indexes) from on-disk state. A main-memory database stored in NVRAM, however, can potentially recover instantly after a power failure. Modern main-memory databases typically use lock-free index structures to enable a high degree of concurrency. Thus NVRAM-resident databases need indexes that are both lock-free, persistent, and able to recover (almost) instantly after a crash. In this paper, we show how to easily build such index structures. A key enabling component of our scheme is a multi-word compare-and-swap operation, PMwCAS, that is lock-free, persistent, and efficient. PMwCAS significantly reduces the complexity of building lock-free indexes, which we illustrate by implementing both doubly-linked skip lists and the Bw-tree lock-free B+-tree for NVRAM. Experimental results show that PMwCAS's runtime overhead is very low (~4–6% under realistic workloads). This overhead is sufficiently low that the same implementation can be used for both DRAM and NVRAM resident indexes.

[1]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[2]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[3]  Qin Jin,et al.  Persistent B+-Trees in Non-Volatile Main Memory , 2015, Proc. VLDB Endow..

[4]  Håkan Sundell Wait-Free Multi-Word Compare-and-Swap Using Greedy Helping and Grabbing , 2011, International Journal of Parallel Programming.

[5]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[6]  Keir Fraser,et al.  Practical lock-freedom , 2003 .

[7]  Stratis Viglas,et al.  REWIND: Recovery Write-Ahead System for In-Memory Non-Volatile Data-Structures , 2015, Proc. VLDB Endow..

[8]  Ismail Oukid,et al.  FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory , 2016, SIGMOD Conference.

[9]  Hans-Juergen Boehm,et al.  Makalu: fast recoverable allocation of non-volatile memory , 2016, OOPSLA.

[10]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[11]  Philippas Tsigas,et al.  Lock-free deques and doubly linked lists , 2008, J. Parallel Distributed Comput..

[12]  Keir Fraser,et al.  A Practical Multi-word Compare-and-Swap Operation , 2002, DISC.

[13]  Chung H. Lam,et al.  Storage Class Memory , 2010, 2010 10th IEEE International Conference on Solid-State and Integrated Circuit Technology.

[14]  William Pugh,et al.  Skip Lists: A Probabilistic Alternative to Balanced Trees , 1989, WADS.

[15]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[16]  John David Valois Lock-free data structures , 1996 .

[17]  Suman Nath,et al.  Rethinking Database Algorithms for Phase Change Memory , 2011, CIDR.

[18]  Michael L. Scott,et al.  Linearizability of Persistent Memory Objects Under a Full-System-Crash Failure Model , 2016, DISC.

[19]  M. Hosomi,et al.  A novel nonvolatile memory with spin torque transfer magnetization switching: spin-ram , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[20]  H. T. Kung,et al.  Concurrent manipulation of binary search trees , 1980, TODS.

[21]  Michael B. Greenwald,et al.  Two-handed emulation: how to build non-blocking implementations of complex data-structures using DCAS , 2002, PODC '02.

[22]  Viktor Leis,et al.  The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[23]  D. B. Lomet Process structuring, synchronization, and recovery using atomic actions , 1977 .

[24]  Ryan Stutsman,et al.  To Lock, Swap, or Elide: On the Interplay of Hardware Transactional Memory and Lock-Free Indexing , 2015, Proc. VLDB Endow..

[25]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[26]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[27]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[28]  Hagit Attiya,et al.  Built-In Coloring for Highly-Concurrent Doubly-Linked Lists , 2006, DISC.

[29]  Michael L. Scott,et al.  Brief Announcement: Preserving Happens-before in Persistent Memory , 2016, SPAA.

[30]  Timothy L. Harris,et al.  A Pragmatic Implementation of Non-blocking Linked-Lists , 2001, DISC.

[31]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[32]  Per-Åke Larson,et al.  BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory , 2018, Proc. VLDB Endow..

[33]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[34]  Viktor Leis,et al.  The ART of practical synchronization , 2016, DaMoN '16.

[35]  Ismail Oukid,et al.  Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems , 2017, Proc. VLDB Endow..

[36]  Ismail Oukid,et al.  Instant Recovery for Main Memory Databases , 2015, CIDR.

[37]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[38]  Philippas Tsigas,et al.  Reactive multiword synchronization for multiprocessors , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[39]  Timothy J. Slegel,et al.  Transactional Memory Architecture and Implementation for IBM System Z , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[40]  Hasso Plattner,et al.  nvm malloc: Memory Allocation for NVRAM , 2015, ADMS@VLDB.