CprFS: a user-level file system to support consistent file states for checkpoint and restart
暂无分享,去创建一个
[1] Satoshi Hoshina,et al. Fault recovery mechanism for multiprocessor servers , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.
[2] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[3] Jason Nieh,et al. Proceedings of the 5th Symposium on Operating Systems Design and Implementation , 2022 .
[4] Dhabaleswar K. Panda,et al. Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[5] Miron Livny,et al. Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .
[6] Rob VanderWijngaart,et al. NAS Parallel Benchmarks I/O Version 2.4. 2.4 , 2002 .
[7] Hua Zhong,et al. CRAK: Linux Checkpoint/Restart As a Kernel Module , 1996 .
[8] Volker Strumpen,et al. Fault-Tolerant File-I/O for Portable Checkpointing Systems , 2000, The Journal of Supercomputing.
[9] Dan Pei,et al. Modification Operation Buffering : A Low-Overhead Approach to Checkpoint User Files , 1999 .
[10] Kuo-Bin Li,et al. ClustalW-MPI: ClustalW analysis using distributed and parallel computing , 2003, Bioinform..
[11] Heon Young Yeom,et al. Design and Implementation of Multiple Fault-Tolerant MPI over Myrinet (M^3) , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[12] J. Duell. The design and implementation of Berkeley Lab's linux checkpoint/restart , 2005 .
[13] Piyush Maheshwari,et al. Supporting Cost-Effective Fault Tolerance in Distributed Message-Passing Applications with File Operations , 1999, The Journal of Supercomputing.
[14] Yi-Min Wang,et al. Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[15] Jose Renato Santos,et al. Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Operating Systems , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[16] Heon Young Yeom,et al. A user-transparent recoverable file system for distributed computing environment , 2005, CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005..
[17] Anthony Skjellum,et al. A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..
[18] Josep Torrellas,et al. ReViveI/O: efficient handling of I/O in highly-available rollback-recovery servers , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[19] B. Bouteiller,et al. MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[20] Wenguang Chen,et al. Thckpt: Transparent Checkpointing of Linux Processes Under IA-64 , 2005, PDPTA.
[21] Jiwu Shu,et al. Parallel algorithm and implementation for realtime dynamic simulation of power system , 2005, 2005 International Conference on Parallel Processing (ICPP'05).
[22] Srinidhi Varadarajan,et al. DejaVu: transparent user-level checkpointing, migration and recovery for distributed systems , 2006, SC.
[23] S. Yajnik,et al. Checkpointing in CosMiC: a user-level process migration environment , 1997, Proceedings Pacific Rim International Symposium on Fault-Tolerant Systems.
[24] Ashwin Raju Jeyakumar. Metamori: A library for Incremental File Checkpointing , 2004 .
[25] Kai Li,et al. Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.
[26] Wu-chun Feng,et al. The design, implementation, and evaluation of mpiBLAST , 2003 .
[27] Bo Hong,et al. File System Workload Analysis For Large Scientific Computing Applications , 2004, MSST.
[28] Jason Duell,et al. The design and implementation of Berkeley Lab's linuxcheckpoint/restart , 2005 .