Fine-grained checkpoint based on non-volatile memory

New non-volatile memory (e.g., phase-change memory) provides fast access, large capacity, byte-addressability, and non-volatility features. These features, fast-byte-persistency, will bring new opportunities to fault tolerance. We propose a fine-grained checkpoint based on non-volatile memory. We extend the current virtual memory manager to manage non-volatile memory, and design a persistent heap with support for fast allocation and checkpointing of persistent objects. To achieve a fine-grained checkpoint, we scatter objects across virtual pages and rely on hardware page-protection to monitor the modifications. In our system, two objects in different virtual pages may reside on the same physical page. Modifying one object would not interfere with the other object. This allows us to monitor and checkpoint objects smaller than 4096 bytes in a fine-grained way. Compared with previous page-grained based checkpoint mechanisms, our new checkpoint method can greatly reduce the data copied at checkpoint time and better leverage the limited bandwidth of non-volatile memory.

[1]  Norman P. Jouppi,et al.  FREE-p: Protecting non-volatile memory against both hard and soft errors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[2]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[3]  Bianca Schroeder,et al.  Understanding failures in petascale computers , 2007 .

[4]  Roman A. Pletka,et al.  PSS : A prototype storage subsystem based on PCM , 2014 .

[5]  A. L. Narasimha Reddy,et al.  SCMFS: A file system for Storage Class Memory , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[6]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[8]  Laxmikant V. Kalé,et al.  FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[9]  R. Govindarajan,et al.  Bi-Modal DRAM Cache: Improving Hit Rate, Hit Latency and Bandwidth , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[10]  Michael M. Swift,et al.  Aerie: flexible file-system interfaces to storage-class memory , 2014, EuroSys '14.

[11]  Dejan S. Milojicic,et al.  Optimizing Checkpoints Using NVM as Virtual Memory , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[12]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[13]  Leon O. Chua,et al.  Circuit Elements With Memory: Memristors, Memcapacitors, and Meminductors , 2009, Proceedings of the IEEE.

[14]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[15]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[16]  Yiran Chen,et al.  Design of Last-Level On-Chip Cache Using Spin-Torque Transfer RAM (STT RAM) , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[17]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[18]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[19]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[20]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[21]  Babak Falsafi,et al.  Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[22]  Anirudh Badam How Persistent Memory Will Change Software Systems , 2013, Computer.

[23]  Kai Li,et al.  Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..

[24]  Torvald Riegel,et al.  Dynamic performance tuning of word-based software transactional memory , 2008, PPoPP.

[25]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[26]  R. Manikantan,et al.  Bi-Modal DRAM Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, MICRO 2014.

[27]  Dong Li,et al.  Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[28]  Aamer Jaleel,et al.  CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[29]  Torvald Riegel,et al.  Time-Based Software Transactional Memory , 2010, IEEE Transactions on Parallel and Distributed Systems.

[30]  Yuan Xie,et al.  Hybrid checkpointing using emerging nonvolatile memories for future exascale systems , 2011, TACO.

[31]  Franck Cappello,et al.  FTI: High performance Fault Tolerance Interface for hybrid systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[32]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[33]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[34]  Luis A. Lastras,et al.  PreSET: Improving performance of phase change memories by exploiting asymmetry in write times , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).