Compiler-support for Critical Data Persistence in NVM

Non-volatile Main Memories (NVMs) offer a promising way to preserve data persistence and enable computation recovery in case of failure. While the use of NVMs can significantly reduce the overhead of failure recovery, which is the case with High-Performance Computing (HPC) kernels, rewriting existing programs or writing new applications for NVMs is non-trivial. In this article, we present a compiler-support that automatically inserts complex instructions into kernels to achieve NVM data-persistence based on a simple programmer directive. Unlike checkpointing techniques that store the whole system state, our technique only persists user-designated objects as well as some parameters required for safe recovery such as loop induction variables. Also, our technique can reduce the number of data transfer operations, because our compiler coalesces consecutive memory-persisting operations into a single memory transaction per cache line when possible. Our compiler-support is implemented in the LLVM tool-chain and introduces the necessary modifications to loop-intensive computational kernels (e.g., TMM, LU, Gauss, and FFT) to force data persistence. The experiments show that our proposed compiler-support outperforms the most recent checkpointing techniques while its performance overheads are insignificant.

[1]  Linpeng Huang,et al.  JDap: Supporting in-memory data persistence in javascript using Intel's PMDK , 2019, J. Syst. Archit..

[2]  Olaf Spinczyk,et al.  Cache-Line Transactions: Building Blocks for Persistent Kernel Data Structures Enabled by AspectC++ , 2019, PLOS@SOSP.

[3]  Yuan Xie,et al.  SuperMem: Enabling Application-transparent Secure Persistent Memory with Low Overheads , 2019, MICRO.

[4]  Heiko Böck,et al.  Java Persistence API , 2012 .

[5]  Junaid Haroon Siddiqui,et al.  Efficient intermittent computing with differential checkpointing , 2019, LCTES.

[6]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[7]  Yan Solihin,et al.  Proteus: A Flexible and Fast Software Supported Hardware Logging approach for NVM , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Dong Li,et al.  High Performance Data Persistence in Non-Volatile Memory for Resilient High Performance Computing , 2017, ArXiv.

[9]  Nitin H. Vaidya,et al.  Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme , 1997, IEEE Trans. Computers.

[10]  Tri Nguyen,et al.  PiCL: A Software-Transparent, Persistent Cache Log for Nonvolatile Main Memory , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Changhee Jung,et al.  CoSpec: Compiler Directed Speculative Intermittent Computation , 2019, MICRO.

[12]  Xin Yao,et al.  NVCL: Exploiting NVRAM in Cache-Line Granularity Differential Logging , 2018, 2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA).

[13]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[14]  Michael L. Scott,et al.  iDO: Compiler-Directed Failure Atomicity for Nonvolatile Memory , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  James R. Larus,et al.  Fine-Grain Checkpointing with In-Cache-Line Logging , 2019, ASPLOS.

[16]  Yan Solihin,et al.  Efficient Checkpointing of Loop-Based Codes for Non-volatile Main Memory , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  Yan Solihin,et al.  Lazy Persistency: A High-Performing and Write-Efficient Software Persistency Technique , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[18]  Steven Swanson,et al.  Breeze: User-Level Access to Non-Volatile Main Memories for Legacy Software , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).

[19]  Ismail Oukid,et al.  Bridging the Latency Gap between NVM and DRAM for Latency-bound Operations , 2019, DaMoN.

[20]  Jian Huang,et al.  QuickCheck: using speculation to reduce the overhead of checks in NVM frameworks , 2019, VEE.

[21]  Josep Torrellas,et al.  AutoPersist: an easy-to-use Java NVM framework based on reachability , 2019, PLDI.

[22]  Bianca Schroeder,et al.  Understanding failures in petascale computers , 2007 .

[23]  Saurabh Gupta,et al.  Lazy Checkpointing: Exploiting Temporal Locality in Failures to Mitigate Checkpointing Overheads on Extreme-Scale Systems , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[24]  Willi-Hans Steeb The nonlinear workbook : chaos, fractals, cellular automata, neural networks, genetic algorithms, fuzzy logic with C++, Java, SymbolicC++ and reduce programs , 1999 .

[25]  Tudor David,et al.  Log-Free Concurrent Data Structures , 2018, USENIX Annual Technical Conference.

[26]  Gerhard Wellein,et al.  CRAFT: A Library for Easier Application-Level Checkpoint/Restart and Automatic Fault Tolerance , 2017, IEEE Transactions on Parallel and Distributed Systems.

[27]  Reem Elkhouly,et al.  Efficient Checkpointing with Recompute Scheme for Non-volatile Main Memory , 2019, ACM Trans. Archit. Code Optim..

[28]  Terence Kelly,et al.  Failure-Atomic Persistent Memory Updates via JUSTDO Logging , 2016, ASPLOS.

[29]  Chundong Wang,et al.  Crash recoverable ARMv8-oriented B+-tree for byte-addressable persistent memory , 2019, LCTES.

[30]  Seyong Lee,et al.  NVL-C: Static Analysis Techniques for Efficient, Correct Programming of Non-Volatile Main Memory Systems , 2016, HPDC.

[31]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[32]  Seyong Lee,et al.  Language-Based Optimizations for Persistence on Nonvolatile Main Memory Systems , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[33]  Hai Jin,et al.  Dual-Page Checkpointing , 2019, ACM Trans. Archit. Code Optim..

[34]  Devesh Tiwari,et al.  Clover: Compiler Directed Lightweight Soft Error Resilience , 2015, LCTES.

[35]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[36]  Ravishankar K. Iyer,et al.  Measuring and Understanding Extreme-Scale Application Resilience: A Field Study of 5,000,000 HPC Application Runs , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[37]  Milo M. K. Martin,et al.  SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[38]  Seung Ryoul Maeng,et al.  Efficient Hardware-Assisted Logging with Asynchronous and Direct-Update for Persistent Memory , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[39]  Dong Li,et al.  Understanding Application Recomputability without Crash Consistency in Non-Volatile Memory , 2018, MCHPC@SC.

[40]  Hans-Juergen Boehm,et al.  Atlas: leveraging locks for non-volatile memory consistency , 2014, OOPSLA.

[41]  Aziz Mohaisen,et al.  Towards Low-Cost Mechanisms to Enable Restoration of Encrypted Non-Volatile Memories , 2021, IEEE Transactions on Dependable and Secure Computing.

[42]  Jongmoo Choi,et al.  ThyNVM: Enabling software-transparent crash consistency in persistent memory systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[43]  David I. August,et al.  Automatic Instruction-Level Software-Only Recovery , 2006, IEEE Micro.

[44]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[45]  Haibo Chen,et al.  Espresso: Brewing Java For More Non-Volatility with Non-volatile Memory , 2017, ASPLOS.

[46]  Gennady Pekhimenko,et al.  Janus: Optimizing Memory and Storage Support for Non-Volatile Memory Systems , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[47]  Samira Manabi Khan,et al.  Programming for Non-Volatile Main Memory Is Hard , 2017, APSys.

[48]  Steven Swanson,et al.  Pangolin: A Fault-Tolerant Persistent Memory Programming Library , 2019, USENIX Annual Technical Conference.

[49]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[50]  Thomas F. Wenisch,et al.  Memory persistency , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[51]  Karsten Schwan,et al.  NVRAM-aware Logging in Transaction Systems , 2014, Proc. VLDB Endow..

[52]  Noah Treuhaft,et al.  Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies , 2002 .

[53]  Kai Li,et al.  Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..

[54]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[55]  Aviral Shrivastava,et al.  InCheck: An in-application recovery scheme for soft errors , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).