Robust Shared Objects for Non-Volatile Main Memory

Research in concurrent in-memory data structures has focused almost exclusively on models where processes are either reliable, or may fail by crashing permanently. The case where processes may recover from failures has received little attention because recovery from conventional volatile memory is impossible in the event of a system crash, during which both the state of main memory and the private states of processes are lost. Future hardware architectures are likely to include various forms of non-volatile random access memory (NVRAM), creating new opportunities to design robust main memory data structures that can recover from system crashes. In this paper we advance the theoretical foundations of such data structures in two ways. First, we review several known variations of Herlihy and Wing's linearizability property that were proposed in the context of message passing systems but also apply in our NVRAM-based model, we discuss the limitations of these properties with respect to our specific goals, and we propose an alternative correctness condition called recoverable linearizability. Second, we discuss techniques for implementing shared objects that satisfy such properties with a focus on wait-free implementations. Specifically, we demonstrate how to achieve different variations of linearizability in our model by transforming two classic wait-free constructions.

[1]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.

[2]  Marcos K. Aguilera,et al.  Strict Linearizability and the Power of Aborting , 2003 .

[3]  Arif Merchant,et al.  FAB: building distributed enterprise disk arrays from commodity components , 2004, ASPLOS XI.

[4]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[5]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[6]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.

[7]  Marina Papatriantafilou,et al.  Self-Stabilization of Wait-Free Shared Memory Objects , 1995, WDAG.

[8]  Amos Israeli,et al.  Bounded time-stamps , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[9]  Terence Kelly,et al.  Failure-atomic msync(): a simple and efficient mechanism for preserving the integrity of durable data , 2013, EuroSys '13.

[10]  Engin Ipek,et al.  Dynamically replicated memory: building reliable systems from nanoscale resistive memories , 2010, ASPLOS 2010.

[11]  Sam Toueg,et al.  Fault-tolerant wait-free shared objects , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[12]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[13]  Rachid Guerraoui,et al.  Robust emulations of shared memory in a crash-recovery model , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[14]  Michael Merritt,et al.  Computing with Infinitely Many Processes , 2000, DISC.

[15]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[16]  Arif Merchant,et al.  Building Storage Registers from Crash-Recovery Processes , 2003 .

[17]  Engin Ipek,et al.  Dynamically replicated memory: building reliable systems from nanoscale resistive memories , 2010, ASPLOS XV.

[18]  Juliane Junker,et al.  Computer Organization And Design The Hardware Software Interface , 2016 .

[19]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[20]  Edsger W. Dijkstra,et al.  Solution of a problem in concurrent programming control , 1965, CACM.

[21]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[22]  Paolo Faraboschi,et al.  Operating System Support for NVM+DRAM Hybrid Main Memory , 2009, HotOS.

[23]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[24]  Lisa Higham,et al.  Fault-Tolerant Implementations of Regular Registers by Safe Registers with Applications to Networks , 2009, ICDCN.

[25]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[26]  Leslie Lamport,et al.  On Interprocess Communication-Part I: Basic Formalism, Part II: Algorithms , 2016 .

[27]  Thomas F. Wenisch,et al.  Memory persistency , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[28]  David S. Greenberg,et al.  Computing with faulty shared memory , 1992, PODC '92.

[29]  Roy Friedman,et al.  On the Locality of Consistency Conditions , 2003, DISC.

[30]  Marcos K. Aguilera,et al.  Failure detection and consensus in the crash-recovery model , 1998, Distributed Computing.

[31]  Thomas Moscibroda,et al.  Resilience of mutual exclusion algorithms to transient memory faults , 2011, PODC '11.