Transitioning scientific applications to using non-volatile memory for resilience

Scientific applications often run for long periods of time, and as a result, frequently save their internal states to storage media in cases of unexpected interruptions (e.g., hardware failures). Emerging non-volatile memory (NVRAM) can write up to 40× faster than traditional mechanical storage devices, providing an attractive medium for this purpose. This paper investigates the implications of transitioning a scientific application, Fluidanimate, to use NVRAM for fault tolerance. In particular, we evaluate the performance implications and ease-of-use of four fault-tolerance approaches: 1) logging through transactions, 2) multi-versioning through copy-on-write operations, and 3) checkpointing through IO operations (e.g., fwrite) on a direct access (DAX) filesystem and 4) checkpointing with a DRAM cache. Our study results in three key findings. First, additional changes to the application are required to take advantage of the increase in IO speed provided by NVRAM. Second, the performance scalability of the approaches lack when considering a single process. Third, NVRAM can increase reliability in a distributed computing environment by allowing individual nodes to error and automatically recover before the rest of the system notices.

[1]  S. Sherwin,et al.  Finite Difference, Finite Element and Finite Volume Methods for Partial Differential Equations , 2005 .

[2]  Dhabaleswar K. Panda,et al.  System-Level Scalable Checkpoint-Restart for Petascale Computing , 2016, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS).

[3]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[4]  F. Cappello,et al.  Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[5]  Tiancong Wang,et al.  Hardware Supported Permission Checks on Persistent Objects for Performance and Programmability , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[6]  Dong Li,et al.  Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[7]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[8]  Gabriel Rodríguez,et al.  CPPC: a compiler‐assisted tool for portable checkpointing of message‐passing applications , 2010, Concurr. Comput. Pract. Exp..

[9]  Sam H. Noh,et al.  SLM-DB: Single-Level Key-Value Store with Persistent Memory , 2019, FAST.

[10]  Jongmoo Choi,et al.  ThyNVM: Enabling software-transparent crash consistency in persistent memory systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Robert E. Lyons,et al.  The Use of Triple-Modular Redundancy to Improve Computer Reliability , 1962, IBM J. Res. Dev..

[12]  Mustafa Mat Deris,et al.  Extended Heartbeat Mechanism for Fault Detection Service Methodology , 2009, FGIT-GDC.

[13]  Youyou Lu,et al.  A high performance file system for non-volatile main memory , 2016, EuroSys.

[14]  Erik Seligman,et al.  Application Level Fault Tolerance in Heterogenous Networks of Workstations , 1997, J. Parallel Distributed Comput..

[15]  Youyou Lu,et al.  Blurred Persistence , 2016, ACM Trans. Storage.

[16]  David E. Keyes,et al.  Prospects for CFD on Petaflops Systems , 1997 .

[17]  Peter K. Szwed,et al.  Application-level checkpointing for shared memory programs , 2004, ASPLOS XI.

[18]  Thomas F. Wenisch,et al.  Delegated persist ordering , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Yan Solihin,et al.  Efficient Checkpointing of Loop-Based Codes for Non-volatile Main Memory , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  Weimin Zheng,et al.  DudeTM: Building Durable Transactions with Decoupling for Persistent Memory , 2017, ASPLOS.

[21]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[22]  Jian Xu,et al.  NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories , 2016, FAST.

[23]  S. Cant High-performance computing in computational fluid dynamics: progress and challenges , 2002, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[24]  C. M. Lin,et al.  Atomic-level engineering of phase change material for novel fast-switching and high-endurance PCM for storage class memory application , 2013, 2013 IEEE International Electron Devices Meeting.

[25]  Fred B. Chambers,et al.  Distributed Computing , 2016, Lecture Notes in Computer Science.

[26]  Zizhong Chen,et al.  FT-ScaLAPACK: correcting soft errors on-line for ScaLAPACK cholesky, QR, and LU factorization routines , 2014, HPDC '14.

[27]  Tatu Yll,et al.  Concurrent Shadow Paging: a New Direction for Database Research , 1992 .

[28]  Daniel Marques,et al.  Automated application-level checkpointing of MPI programs , 2003, PPoPP '03.

[29]  Peter H. Beckman,et al.  Understanding Checkpointing Overheads on Massive-Scale Systems: Analysis of the IBM Blue Gene/P System , 2010, Int. J. High Perform. Comput. Appl..

[30]  Jason Duell,et al.  Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .

[31]  Arun Jagatheesan,et al.  Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[32]  Jishen Zhao,et al.  Steal but No Force: Efficient Hardware Undo+Redo Logging for Persistent Memory Systems , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[33]  Laxmikant V. Kalé,et al.  Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm , 2012, 2012 IEEE International Conference on Cluster Computing.

[34]  Josef Bacik,et al.  BTRFS: The Linux B-Tree Filesystem , 2013, TOS.

[35]  Raymond A. Lorie,et al.  Physical integrity in a large segmented database , 1977, TODS.

[36]  John Paul Walters,et al.  Application-Level Checkpointing Techniques for Parallel Programs , 2006, ICDCIT.

[37]  Al Geist,et al.  Supercomputing's monster in the closet , 2016, IEEE Spectrum.

[38]  Terence Kelly,et al.  Failure-Atomic Persistent Memory Updates via JUSTDO Logging , 2016 .

[39]  Per-Åke Larson,et al.  Easy Lock-Free Indexing in Non-Volatile Memory , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[40]  Scott Klasky,et al.  Exploring Automatic, Online Failure Recovery for Scientific Applications at Extreme Scales , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[41]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[42]  Yi-Min Wang,et al.  Integrating checkpointing with transaction processing , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[43]  Yuan Xie,et al.  Kiln: Closing the performance gap between systems with and without persistence support , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[44]  Sam H. Noh,et al.  Write-Optimized Dynamic Hashing for Persistent Memory , 2019, FAST.

[45]  Bianca Schroeder,et al.  To checkpoint or not to checkpoint: Understanding energy-performance-I/O tradeoffs in HPC checkpointing , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[46]  Shih-Hung Chen,et al.  Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..

[47]  Michael M. Swift,et al.  An Analysis of Persistent Memory Use with WHISPER , 2017, ASPLOS.

[48]  Yi-Min Wang,et al.  Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[49]  Andrea Walther,et al.  Online Checkpointing for Parallel Adjoint Computation in PDEs: Application to Goal-Oriented Adaptivity and Flow Control , 2006, Euro-Par.

[50]  Miron Livny,et al.  Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .

[51]  Paul A. Jensen Quadded NOR Logic , 1963 .

[52]  Stratis Viglas,et al.  DHTM: Durable Hardware Transactional Memory , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[53]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[54]  Ren Xiaoguang,et al.  The Analysis of Checkpoint Strategies for Large-Scale CFD Simulation in HPC System , 2014, 2014 Fourth International Conference on Communication Systems and Network Technologies.

[55]  Steven Swanson,et al.  Pangolin: A Fault-Tolerant Persistent Memory Programming Library , 2019, USENIX Annual Technical Conference.

[56]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[57]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[58]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[59]  Dejan S. Milojicic,et al.  Optimizing Checkpoints Using NVM as Virtual Memory , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[60]  Rami G. Melhem,et al.  Shadow Computing: An energy-aware fault tolerant computing model , 2014, 2014 International Conference on Computing, Networking and Communications (ICNC).

[61]  J. Bruck,et al.  Efficient checkpointing over local area networks , 1994, Proceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems.