SYSTEM SUPPORT FOR CHECKPOINT AND RESTART OF CHARM++ AND AMPI APPLICATIONS
暂无分享,去创建一个
[1] Gabriel Antoniu,et al. An Efficient and Transparent Thread Migration Scheme in the PM2 Runtime System , 1999, IPPS/SPDP Workshops.
[2] Kai Li,et al. Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.
[3] Laxmikant V. Kalé,et al. Multiparadigm, Multilingual Interoperability: Experience with Converse , 1998, IPPS/SPDP Workshops.
[4] Brian Randell,et al. System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.
[5] Laxmikant V. Kalé,et al. Adaptive MPI , 2003, LCPC.
[6] Jason Duell,et al. The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing , 2005, Int. J. High Perform. Comput. Appl..
[7] Laxmikant V. Kalé,et al. Supporting dynamic parallel object arrays , 2003, Concurr. Comput. Pract. Exp..
[8] Laxmikant V. Kalé,et al. FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).
[9] Hai Jin,et al. Distributed Checkpointing on Clusters with Dynamic Striping and Staggering , 2002, ASIAN.
[10] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[11] Laxmikant V. Kalé,et al. Converse: an interoperable framework for parallel programming , 1996, Proceedings of International Conference on Parallel Processing.
[12] Nitin H. Vaidya,et al. Staggered Consistent Checkpointing , 1999, IEEE Trans. Parallel Distributed Syst..
[13] Laxmikant V. Kalé,et al. NAMD: Biomolecular Simulation on Thousands of Processors , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[14] W. Kent Fuchs,et al. Checkpoint Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems , 1995, IEEE Trans. Parallel Distributed Syst..
[15] Larry Rudolph,et al. Parallel Job Scheduling: Issues and Approaches , 1995, JSSPP.
[16] Thomas Hérault,et al. MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[17] J. Duell. The design and implementation of Berkeley Lab's linux checkpoint/restart , 2005 .
[18] Kai Li,et al. CLIP: A Checkpointing Tool for Message Passing Parallel Programs , 1997, ACM/IEEE SC 1997 Conference (SC'97).
[19] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[20] B. Bouteiller,et al. MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[21] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.