CPPC: a compiler‐assisted tool for portable checkpointing of message‐passing applications
暂无分享,去创建一个
[1] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.
[2] Nitin H. Vaidya,et al. Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme , 1997, IEEE Trans. Computers.
[3] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[4] Tomás F. Pena,et al. Dual BEM for crack growth analysis on distributed-memory multiprocessors , 2000 .
[5] Adrianos Lachanas,et al. MPI-FT: Portable Fault Tolerance Scheme for MPI , 2000, Parallel Process. Lett..
[6] Bülent Sankur,et al. Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.
[7] B. Ramkumar,et al. Portable checkpointing for heterogeneous architectures , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.
[8] Gabriel Rodríguez,et al. Scalable Computing: Practice and Experience , 2008 .
[9] E. N. Elnozahy,et al. Checkpointing for peta-scale systems: a look into the future of practical rollback-recovery , 2004, IEEE Transactions on Dependable and Secure Computing.
[10] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .
[11] Harrick M. Vin,et al. Egida: an extensible toolkit for low-overhead fault-tolerance , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[12] W. Kent Fuchs,et al. Compiler‐assisted full checkpointing , 1994, Softw. Pract. Exp..
[13] Rudolf Eigenmann,et al. Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation , 2003, LCPC.
[14] Volker Strumpen,et al. Portable and fault-tolerant software systems , 1998, IEEE Micro.
[15] Roy Friedman,et al. Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).
[16] Özalp Babaoglu,et al. On the Optimum Checkpoint Selection Problem , 1984, SIAM J. Comput..
[17] Gabriel Rodríguez,et al. Compiler-assisted checkpointing of message-passing applications in heterogeneous environments , 2008 .
[18] Micah Beck,et al. Compiler-Assisted Memory Exclusion for Fast Checkpointing , 1995 .
[19] Keshav Pingali,et al. Mobile MPI programs in computational grids , 2006, PPoPP '06.
[20] B. Bouteiller,et al. MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[21] Hewijin Christine Jiau,et al. Process Recovery in Heterogeneous Systems , 2003, IEEE Trans. Computers.
[22] Gabriel Rodríguez,et al. Controller/Precompiler for Portable Checkpointing , 2006, IEICE Trans. Inf. Syst..
[23] Daniel Marques,et al. C3: A System for Automating Application-Level Checkpointing of MPI Programs , 2003, LCPC.
[24] Michel Raynal,et al. Consistency Issues in Distributed Checkpoints , 1999, IEEE Trans. Software Eng..
[25] Javier D. Bruguera,et al. High performance air pollution modeling for a power plant environment , 2003, Parallel Comput..
[26] Nitin H. Vaidya,et al. Staggered Consistent Checkpointing , 1999, IEEE Trans. Parallel Distributed Syst..
[27] Song Jiang,et al. Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers , 2005, ACM/IEEE SC 2005 Conference (SC'05).