Transitioning scientific applications to using non-volatile memory for resilience
暂无分享,去创建一个
Jishen Zhao | Qing Yi | Jiange Zhang | Xiao Liu | Brandon Nesterenko | Jishen Zhao | Qing Yi | Xinyu Liu | Brandon Nesterenko | Jiange Zhang
[1] S. Sherwin,et al. Finite Difference, Finite Element and Finite Volume Methods for Partial Differential Equations , 2005 .
[2] Dhabaleswar K. Panda,et al. System-Level Scalable Checkpoint-Restart for Petascale Computing , 2016, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS).
[3] Christopher Frost,et al. Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.
[4] F. Cappello,et al. Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[5] Tiancong Wang,et al. Hardware Supported Permission Checks on Persistent Objects for Performance and Programmability , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[6] Dong Li,et al. Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[7] Christian Bienia,et al. Benchmarking modern multiprocessors , 2011 .
[8] Gabriel Rodríguez,et al. CPPC: a compiler‐assisted tool for portable checkpointing of message‐passing applications , 2010, Concurr. Comput. Pract. Exp..
[9] Sam H. Noh,et al. SLM-DB: Single-Level Key-Value Store with Persistent Memory , 2019, FAST.
[10] Jongmoo Choi,et al. ThyNVM: Enabling software-transparent crash consistency in persistent memory systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[11] Robert E. Lyons,et al. The Use of Triple-Modular Redundancy to Improve Computer Reliability , 1962, IBM J. Res. Dev..
[12] Mustafa Mat Deris,et al. Extended Heartbeat Mechanism for Fault Detection Service Methodology , 2009, FGIT-GDC.
[13] Youyou Lu,et al. A high performance file system for non-volatile main memory , 2016, EuroSys.
[14] Erik Seligman,et al. Application Level Fault Tolerance in Heterogenous Networks of Workstations , 1997, J. Parallel Distributed Comput..
[15] Youyou Lu,et al. Blurred Persistence , 2016, ACM Trans. Storage.
[16] David E. Keyes,et al. Prospects for CFD on Petaflops Systems , 1997 .
[17] Peter K. Szwed,et al. Application-level checkpointing for shared memory programs , 2004, ASPLOS XI.
[18] Thomas F. Wenisch,et al. Delegated persist ordering , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] Yan Solihin,et al. Efficient Checkpointing of Loop-Based Codes for Non-volatile Main Memory , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[20] Weimin Zheng,et al. DudeTM: Building Durable Transactions with Decoupling for Persistent Memory , 2017, ASPLOS.
[21] Jacob A. Abraham,et al. Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.
[22] Jian Xu,et al. NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories , 2016, FAST.
[23] S. Cant. High-performance computing in computational fluid dynamics: progress and challenges , 2002, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.
[24] C. M. Lin,et al. Atomic-level engineering of phase change material for novel fast-switching and high-endurance PCM for storage class memory application , 2013, 2013 IEEE International Electron Devices Meeting.
[25] Fred B. Chambers,et al. Distributed Computing , 2016, Lecture Notes in Computer Science.
[26] Zizhong Chen,et al. FT-ScaLAPACK: correcting soft errors on-line for ScaLAPACK cholesky, QR, and LU factorization routines , 2014, HPDC '14.
[27] Tatu Yll,et al. Concurrent Shadow Paging: a New Direction for Database Research , 1992 .
[28] Daniel Marques,et al. Automated application-level checkpointing of MPI programs , 2003, PPoPP '03.
[29] Peter H. Beckman,et al. Understanding Checkpointing Overheads on Massive-Scale Systems: Analysis of the IBM Blue Gene/P System , 2010, Int. J. High Perform. Comput. Appl..
[30] Jason Duell,et al. Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .
[31] Arun Jagatheesan,et al. Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[32] Jishen Zhao,et al. Steal but No Force: Efficient Hardware Undo+Redo Logging for Persistent Memory Systems , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[33] Laxmikant V. Kalé,et al. Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm , 2012, 2012 IEEE International Conference on Cluster Computing.
[34] Josef Bacik,et al. BTRFS: The Linux B-Tree Filesystem , 2013, TOS.
[35] Raymond A. Lorie,et al. Physical integrity in a large segmented database , 1977, TODS.
[36] John Paul Walters,et al. Application-Level Checkpointing Techniques for Parallel Programs , 2006, ICDCIT.
[37] Al Geist,et al. Supercomputing's monster in the closet , 2016, IEEE Spectrum.
[38] Terence Kelly,et al. Failure-Atomic Persistent Memory Updates via JUSTDO Logging , 2016 .
[39] Per-Åke Larson,et al. Easy Lock-Free Indexing in Non-Volatile Memory , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).
[40] Scott Klasky,et al. Exploring Automatic, Online Failure Recovery for Scientific Applications at Extreme Scales , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[41] Rajesh K. Gupta,et al. NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.
[42] Yi-Min Wang,et al. Integrating checkpointing with transaction processing , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.
[43] Yuan Xie,et al. Kiln: Closing the performance gap between systems with and without persistence support , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[44] Sam H. Noh,et al. Write-Optimized Dynamic Hashing for Persistent Memory , 2019, FAST.
[45] Bianca Schroeder,et al. To checkpoint or not to checkpoint: Understanding energy-performance-I/O tradeoffs in HPC checkpointing , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).
[46] Shih-Hung Chen,et al. Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..
[47] Michael M. Swift,et al. An Analysis of Persistent Memory Use with WHISPER , 2017, ASPLOS.
[48] Yi-Min Wang,et al. Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[49] Andrea Walther,et al. Online Checkpointing for Parallel Adjoint Computation in PDEs: Application to Goal-Oriented Adaptivity and Flow Control , 2006, Euro-Par.
[50] Miron Livny,et al. Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .
[51] Paul A. Jensen. Quadded NOR Logic , 1963 .
[52] Stratis Viglas,et al. DHTM: Durable Hardware Transactional Memory , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[53] Onur Mutlu,et al. Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.
[54] Ren Xiaoguang,et al. The Analysis of Checkpoint Strategies for Large-Scale CFD Simulation in HPC System , 2014, 2014 Fourth International Conference on Communication Systems and Network Technologies.
[55] Steven Swanson,et al. Pangolin: A Fault-Tolerant Persistent Memory Programming Library , 2019, USENIX Annual Technical Conference.
[56] Michael M. Swift,et al. Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.
[57] No License,et al. Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .
[58] Bingsheng He,et al. NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.
[59] Dejan S. Milojicic,et al. Optimizing Checkpoints Using NVM as Virtual Memory , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[60] Rami G. Melhem,et al. Shadow Computing: An energy-aware fault tolerant computing model , 2014, 2014 International Conference on Computing, Networking and Communications (ICNC).
[61] J. Bruck,et al. Efficient checkpointing over local area networks , 1994, Proceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems.