Efficient Checkpointing with Recompute Scheme for Non-volatile Main Memory

Future main memory will likely include Non-Volatile Memory. Non-Volatile Main Memory (NVMM) provides an opportunity to rethink checkpointing strategies for providing failure safety to applications. While there are many checkpointing and logging schemes in the literature, their use must be revisited as they incur high execution time overheads as well as a large number of additional writes to NVMM, which may significantly impact write endurance. In this article, we propose a novel recompute-based failure safety approach and demonstrate its applicability to loop-based code. Rather than keeping a fully consistent logging state, we only log enough state to enable recomputation. Upon a failure, our approach recovers to a consistent state by determining which parts of the computation were not completed and recomputing them. Effectively, our approach removes the need to keep checkpoints or logs, thus reducing execution time overheads and improving NVMM write endurance at the expense of more complex recovery. We compare our new approach against logging and checkpointing on five scientific workloads, including tiled matrix multiplication, on a computer system model that was built on gem5 and supports Intel PMEM instruction extensions. For tiled matrix multiplication, our recompute approach incurs an execution time overhead of only 5%, in contrast to 8% overhead with logging and 207% overhead with checkpointing. Furthermore, recompute only adds 7% additional NVMM writes, compared to 111% with logging and 330% with checkpointing. We also conduct experiments on real hardware, allowing us to run our workloads to completion while varying the number of threads used for computation. These experiments substantiate our simulation-based observations and provide a sensitivity study and performance comparison between the Recompute Scheme and Naive Checkpointing.

[1]  Terence Kelly,et al.  Failure-Atomic Persistent Memory Updates via JUSTDO Logging , 2016, ASPLOS.

[2]  Yan Solihin,et al.  Efficient Checkpointing of Loop-Based Codes for Non-volatile Main Memory , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Weimin Zheng,et al.  DudeTM: Building Durable Transactions with Decoupling for Persistent Memory , 2017, ASPLOS.

[4]  Yan Solihin,et al.  ObfusMem: A low-overhead access obfuscation for trusted memories , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[5]  Shoji Ikeda,et al.  2Mb Spin-Transfer Torque RAM (SPRAM) with Bit-by-Bit Bidirectional Current Write and Parallelizing-Direction Current Read , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[6]  Yan Solihin,et al.  Proteus: A Flexible and Fast Software Supported Hardware Logging approach for NVM , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Yan Solihin,et al.  Write-Aware Management of NVM-based Memory Extensions , 2016, ICS.

[8]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[9]  Thomas F. Wenisch,et al.  High-Performance Transactions for Persistent Memories , 2016, ASPLOS.

[10]  Dhabaleswar K. Panda,et al.  MIC-Check: a distributed check pointing framework for the intel many integrated cores architecture , 2014, HPDC '14.

[11]  Dejan S. Milojicic,et al.  Optimizing Checkpoints Using NVM as Virtual Memory , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[12]  Moinuddin K. Qureshi Pay-As-You-Go: Low-overhead hard-error correction for phase change memories , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Yan Solihin,et al.  Lazy Persistency: A High-Performing and Write-Efficient Software Persistency Technique , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[14]  Kurt B. Ferreira,et al.  A checkpoint compression study for high-performance computing systems , 2015, Int. J. High Perform. Comput. Appl..

[15]  Bronis R. de Supinski,et al.  Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  R. Bez,et al.  An 8Mb demonstrator for high-density 1.8V Phase-Change Memories , 2004, 2004 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.04CH37525).

[17]  Bianca Schroeder,et al.  Understanding failures in petascale computers , 2007 .

[18]  Stratis Viglas,et al.  ATOM: Atomic Durability in Non-volatile Memory through Hardware Logging , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[19]  Hisashi Shima,et al.  Resistive Random Access Memory (ReRAM) Based on Metal Oxides , 2010, Proceedings of the IEEE.

[20]  Yuan Xie,et al.  Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[21]  Stratis Viglas,et al.  Efficient persist barriers for multicores , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Yan Solihin,et al.  Non-volatile memory host controller interface performance analysis in high-performance I/O systems , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[23]  Somesh Jha,et al.  Static analysis and compiler design for idempotent processing , 2012, PLDI.

[24]  Scott A. Mahlke,et al.  Sentinel scheduling for VLIW and superscalar processors , 1992, ASPLOS V.

[25]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[26]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[27]  Yan Solihin,et al.  Hiding the long latency of persist barriers using speculative execution , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[28]  Karthikeyan Sankaralingam,et al.  Idempotent code generation: Implementation, analysis, and evaluation , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[29]  Thomas F. Wenisch,et al.  Memory persistency , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[30]  Rio Yokota,et al.  Accelerating Matrix Multiplication in Deep Learning by Using Low-Rank Approximation , 2017, 2017 International Conference on High Performance Computing & Simulation (HPCS).

[31]  Michael M. Swift,et al.  FlashVM: Virtual Memory Management on Flash , 2010, USENIX Annual Technical Conference.

[32]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[33]  Yan Solihin,et al.  Silent Shredder: Zero-Cost Shredding for Secure Non-Volatile Main Memory Controllers , 2016, ASPLOS.

[34]  Devesh Tiwari,et al.  Compiler-Directed Lightweight Checkpointing for Fine-Grained Guaranteed Soft Error Recovery , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[35]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[36]  Peter K. Szwed,et al.  Application-level checkpointing for shared memory programs , 2004, ASPLOS XI.

[37]  Patrick Th. Eugster,et al.  NVthreads: Practical Persistence for Multi-threaded Applications , 2017, EuroSys.

[38]  Youyou Lu,et al.  Loose-Ordering Consistency for persistent memory , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[39]  Jun Yang,et al.  Phase-Change Technology and the Future of Main Memory , 2010, IEEE Micro.

[40]  Michael L. Scott,et al.  iDO: Compiler-Directed Failure Atomicity for Nonvolatile Memory , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[41]  Satish Narayanasamy,et al.  Persistency for synchronization-free regions , 2018, PLDI.

[42]  Karthikeyan Sankaralingam,et al.  Idempotent processor architecture , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[43]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[44]  Stratis Viglas,et al.  REWIND: Recovery Write-Ahead System for In-Memory Non-Volatile Data-Structures , 2015, Proc. VLDB Endow..

[45]  Hans-Juergen Boehm,et al.  Atlas: leveraging locks for non-volatile memory consistency , 2014, OOPSLA.

[46]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[47]  Andy Rudoff,et al.  Persistent Memory Programming , 2017, login Usenix Mag..

[48]  John L. Hennessy,et al.  The performance advantages of integrating block data transfer in cache-coherent multiprocessors , 1994, ASPLOS VI.

[49]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[50]  Youjip Won,et al.  NVWAL: Exploiting NVRAM in Write-Ahead Logging , 2016, ASPLOS.

[51]  Brian N. Bershad,et al.  Fast mutual exclusion for uniprocessors , 1992, ASPLOS V.

[52]  Yangqing Jia,et al.  Learning Semantic Image Representations at a Large Scale , 2014 .