Application-level checkpointing for shared memory programs
暂无分享,去创建一个
Peter K. Szwed | Daniel Marques | Martin Schulz | Keshav Pingali | Greg Bronevetsky | K. Pingali | M. Schulz | G. Bronevetsky | P. Szwed | Daniel Marques
[1] Jason Duell,et al. The design and implementation of Berkeley Lab's linuxcheckpoint/restart , 2005 .
[2] J. Duell. The design and implementation of Berkeley Lab's linux checkpoint/restart , 2005 .
[3] Constantine Katsinis,et al. Fault-Tolerant Distributed-Shared-Memory on a Broadcast-Based Interconnection Network , 2000, IPDPS Workshops.
[4] BeguelinAdam,et al. Application Level Fault Tolerance in Heterogeneous Networks of Workstations , 1997 .
[5] Miron Livny,et al. Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .
[6] Daniel Marques,et al. Collective operations in application-level fault-tolerant MPI , 2003, ICS '03.
[7] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[8] Milo M. K. Martin,et al. SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[9] Angelos Bilas,et al. Dynamic data replication: an approach to providing fault-tolerant shared memory clusters , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[10] Nathan Stone. A Checkpoint and Recovery System for the Pittsburgh Supercomputing Center Terascale Computing System , 2001 .
[11] Alan L. Cox,et al. TreadMarks: shared memory computing on networks of workstations , 1996 .
[12] Seif Haridi,et al. Distributed Algorithms , 1992, Lecture Notes in Computer Science.
[13] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[14] Micah Beck,et al. Compiler-Assisted Checkpointing , 1994 .
[15] Josep Torrellas,et al. ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors , 2002, ISCA.
[16] Mitsuhisa Sato,et al. Design of OpenMP Compiler for an SMP Cluster , 1999 .
[17] Liviu Iftode,et al. Scalable Fault-Tolerant Distributed Shared Memory , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[18] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.
[19] Nian-Feng Tzeng,et al. Coherence-based coordinated checkpointing for software distributed shared memory systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.
[20] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[21] Miguel Castro,et al. Distributed shared object memory , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.
[22] Daniel Marques,et al. Collective Operations in an Application-level Fault Tolerant MPI System , 2003 .
[23] Miguel Castro,et al. A checkpoint protocol for an entry consistent shared memory system , 1994, PODC '94.
[24] Daniel Marques,et al. Automated application-level checkpointing of MPI programs , 2003, PPoPP '03.
[25] William R. Dieter,et al. A user-level checkpointing library for POSIX threads programs , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).