The Parallel Persistent Memory Model

We consider a parallel computational model, the Parallel Persistent Memory model, comprised of P processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault at any time (with bounded probability), and possibly restart. When a processor faults, all of its state and local ephemeral memory is lost, but the persistent memory remains. This model is motivated by upcoming non-volatile memories that are nearly as fast as existing random access memory, are accessible at the granularity of cache lines, and have the capability of surviving power outages. It is further motivated by the observation that in large parallel systems, failure of processors and their caches is not unusual. We present several results for the model, using an approach that breaks a computation into capsules, each of which can be safely run multiple times. For the single-processor version we describe how to simulate any program in the RAM, the external memory model, or the ideal-cache model with an expected constant factor overhead. For the multiprocessor version we describe how to efficiently implement a work-stealing scheduler within the model such that it handles both soft faults, with a processor restarting, and hard faults, with a processor permanently failing. For any multithreaded fork-join computation that is race free, write-after-read conflict free and has W work, D depth, and C maximum capsule work in the absence of faults, the scheduler guarantees a time bound on the model of $Ołeft(\fracW P_A + \fracDP P_A łeftłceilłog_1/(C\f) W\right\rceil\right)$ in expectation, where P is the maximum number of processors, $P_A$ is the average number, and $\faultprob łeq 1/(2C)$ is the probability a processor faults between successive persistent memory accesses. Within the model, and using the proposed methods, we develop efficient algorithms for parallel prefix sums, merging, sorting, and matrix multiply.

[1]  Rachid Guerraoui,et al.  Robust emulations of shared memory in a crash-recovery model , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[2]  James Demmel,et al.  Write-Avoiding Algorithms , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[3]  Franck Cappello,et al.  Toward Exascale Resilience: 2014 update , 2014, Supercomput. Front. Innov..

[4]  Michael L. Scott,et al.  Brief Announcement: Preserving Happens-before in Persistent Memory , 2016, SPAA.

[5]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[6]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[7]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[8]  Amar Phanishayee,et al.  Atomic In-place Updates for Non-volatile Main Memories with Kamino-Tx , 2017, EuroSys.

[9]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[10]  Youjip Won,et al.  NVWAL: Exploiting NVRAM in Write-Ahead Logging , 2016, ASPLOS.

[11]  Guy E. Blelloch,et al.  Sorting with Asymmetric Read and Write Costs , 2015, SPAA.

[12]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[13]  Guy E. Blelloch,et al.  Parallel Algorithms for Asymmetric Read-Write Costs , 2016, SPAA.

[14]  Tudor David,et al.  Log-Free Concurrent Data Structures , 2018, USENIX Annual Technical Conference.

[15]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[16]  Guy E. Blelloch,et al.  Implicit Decomposition for Write-Efficient Connectivity Algorithms , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[17]  Brandon Lucia,et al.  Alpaca: intermittent execution without checkpoints , 2017, Proc. ACM Program. Lang..

[18]  Piotr Indyk,et al.  PRAM Computations Resilient to Memory Faults , 1994, ESA.

[19]  Hans-Juergen Boehm,et al.  Atlas: leveraging locks for non-volatile memory consistency , 2014, OOPSLA.

[20]  Wojciech M. Golab,et al.  Robust Shared Objects for Non-Volatile Main Memory , 2015, OPODIS.

[21]  Brandon Lucia,et al.  Chain: tasks and channels for reliable intermittent programs , 2016, OOPSLA.

[22]  Terence Kelly,et al.  Dalí: A Periodically Persistent Hash Map , 2017, DISC.

[23]  Jagan Singh Meena,et al.  Overview of emerging nonvolatile memory technologies , 2014, Nanoscale Research Letters.

[24]  Luca Benini,et al.  Hibernus: Sustaining Computation During Intermittent Supply for Energy-Harvesting Systems , 2015, IEEE Embedded Systems Letters.

[25]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[26]  Terence Kelly,et al.  Failure-Atomic Persistent Memory Updates via JUSTDO Logging , 2016, ASPLOS.

[27]  Brandon Lucia,et al.  A simpler, safer programming and execution model for intermittent systems , 2015, PLDI.

[28]  Jacob Sorber,et al.  Timely Execution on Intermittently Powered Batteryless Sensors , 2017, SenSys.

[29]  Qin Jin,et al.  Persistent B+-Trees in Non-Volatile Main Memory , 2015, Proc. VLDB Endow..

[30]  Hans-Juergen Boehm,et al.  Makalu: fast recoverable allocation of non-volatile memory , 2016, OOPSLA.

[31]  Richard Cole,et al.  Resource Oblivious Sorting on Multicores , 2010, ICALP.

[32]  Guy E. Blelloch,et al.  Low depth cache-oblivious algorithms , 2010, SPAA '10.

[33]  Matthew Hicks,et al.  Intermittent Computation without Hardware Support or Programmer Intervention , 2016, OSDI.

[34]  Weimin Zheng,et al.  DudeTM: Building Durable Transactions with Decoupling for Persistent Memory , 2017, ASPLOS.

[35]  Michael T. Goodrich,et al.  Fundamental parallel algorithms for private-cache chip multiprocessors , 2008, SPAA '08.

[36]  Giuseppe F. Italiano,et al.  Sorting and searching in the presence of memory faults (without redundancy) , 2004, STOC '04.

[37]  Somesh Jha,et al.  Static analysis and compiler design for idempotent processing , 2012, PLDI.

[38]  Michael L. Scott,et al.  Linearizability of Persistent Memory Objects Under a Full-System-Crash Failure Model , 2016, DISC.

[39]  David Wetherall,et al.  Dewdrop: An Energy-Aware Runtime for Computational RFID , 2011, NSDI.

[40]  Thomas F. Wenisch,et al.  Memory persistency , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[41]  Yonatan Aumann,et al.  Asymptotically optimal PRAM emulation on faulty hypercubes , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[42]  David S. Greenberg,et al.  Computing with faulty shared memory , 1992, PODC '92.

[43]  Sam H. Noh,et al.  WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems , 2017, FAST.

[44]  David Grove,et al.  Failure Recovery in Resilient X10 , 2019, ACM Trans. Program. Lang. Syst..

[45]  Maurice Herlihy,et al.  A persistent lock-free queue for non-volatile memory , 2018, PPoPP.

[46]  Maurice Herlihy,et al.  The Art of Multiprocessor Programming, Revised Reprint , 2012 .

[47]  Guy E. Blelloch,et al.  Efficient Algorithms with Asymmetric Read and Write Costs , 2015, ESA.

[48]  Thomas F. Wenisch,et al.  High-Performance Transactions for Persistent Memories , 2016, ASPLOS.

[49]  Nodari Sitchinava,et al.  Lower Bounds in the Asymmetric External Memory Model , 2017, SPAA.

[50]  Brandon Lucia,et al.  Termination checking and task decomposition for task-based intermittent programs , 2018, CC.

[51]  Andrew W. Appel,et al.  Continuation-passing, closure-passing style , 1989, POPL '89.

[52]  Patrick Th. Eugster,et al.  NVthreads: Practical Persistence for Multi-threaded Applications , 2017, EuroSys.

[53]  G. Muller,et al.  Emerging non-volatile memory technologies , 2003, ESSCIRC 2004 - 29th European Solid-State Circuits Conference (IEEE Cat. No.03EX705).

[54]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[55]  Onur Mutlu,et al.  NVMOVE: Helping Programmers Move to Byte-Based Persistence , 2016, INFLOW@OSDI.