Transparent Redundant Computing with MPI
暂无分享,去创建一个
[1] Andrew Lumsdaine,et al. The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[2] William Gropp,et al. Fault Tolerance in Message Passing Interface Programs , 2004, Int. J. High Perform. Comput. Appl..
[3] Emilio Luque,et al. Euro-Par 2008 - Parallel Processing, 14th International Euro-Par Conference, Las Palmas de Gran Canaria, Spain, August 26-29, 2008, Proceedings , 2008, Euro-Par.
[4] Jesús Labarta,et al. Scaling MPI to short-memory MPPs such as BG/L , 2006, ICS '06.
[5] Rolf Riesen,et al. See applications run and throughput jump: The case for redundant computing in HPC , 2010, 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W).
[6] Emmanuel Jeannot,et al. Fault-Management in P2P-MPI , 2009, International Journal of Parallel Programming.
[7] B. Bouteiller,et al. MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[8] Zhiling Lan,et al. Reliability-aware scalability models for high performance computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[9] James H. Laros,et al. rMPI : increasing fault resiliency in a message-passing environment. , 2011 .
[10] Stéphane Genaud,et al. P2P-MPI: A Peer-to-Peer Framework for Robust Execution of Message Passing Parallel Programs on Grids , 2007, Journal of Grid Computing.
[11] Xin Chen,et al. Symmetric active/active metadata service for high availability parallel file systems , 2009, J. Parallel Distributed Comput..
[12] Jack J. Dongarra,et al. FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World , 2000, PVM/MPI.
[13] Bianca Schroeder,et al. Understanding failures in petascale computers , 2007 .
[14] Emilio Luque,et al. Providing Non-stop Service for Message-Passing Based Parallel Applications with RADIC , 2008, Euro-Par.