Impact of Over-Decomposition on Coordinated Checkpoint/Rollback Protocol
暂无分享,去创建一个
[1] Laxmikant V. Kalé,et al. A fault tolerant protocol for massively parallel systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[2] Franck Cappello,et al. Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[3] Axel W. Krings,et al. Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing , 2009, IEEE Transactions on Dependable and Secure Computing.
[4] Sayantan Sur,et al. Unifying UPC and MPI runtimes: experience with MVAPICH , 2010, PGAS '10.
[5] Claudia Leopold,et al. Parallel and distributed computing , 2000 .
[6] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[7] Mark S. Squillante,et al. Processor Allocation in Multiprogrammed Distributed-Memory Parallel Computer Systems , 1997, J. Parallel Distributed Comput..
[8] Andrew Lumsdaine,et al. The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[9] Thierry Gautier,et al. Optimised Recovery with a Coordinated Checkpoint/Rollback Protocol for Domain Decomposition Applications , 2008, MCO.
[10] Dhabaleswar K. Panda,et al. Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[11] Yuval Tamir,et al. ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS , 1984 .
[12] Georg Hager,et al. Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.
[13] Jesús Labarta,et al. Parallelizing dense and banded linear algebra libraries using SMPSs , 2009, Concurr. Comput. Pract. Exp..
[14] Thierry Gautier,et al. KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors , 2007, PASCO '07.
[15] Thomas Hérault,et al. MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI , 2006, Int. J. High Perform. Comput. Appl..
[16] L. Pigeon,et al. Self-Adaptation of Parallel Applications in Heterogeneous and Dynamic Architectures , 2006, 2006 2nd International Conference on Information & Communication Technologies.
[17] Brian Vinter,et al. Using overdecomposition to overlap communication latencies with computation and take advantage of SMT processors , 2006, 2006 International Conference on Parallel Processing Workshops (ICPPW'06).
[18] Laxmikant V. Kale,et al. Charm++ and AMPI: Adaptive Runtime Strategies via Migratable Objects , 2009 .
[19] Gerson G. H. Cavalheiro,et al. Athapascan-1: On-line building data flow graph in a parallel language , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[20] Laxmikant V. Kalé,et al. FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).
[21] Jack J. Dongarra,et al. Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[22] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .
[23] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.