Design and Evaluation of FA-MPI, a Transactional Resilience Scheme for Non-blocking MPI
暂无分享,去创建一个
[1] David Fiala. Detection and correction of silent data corruption for large-scale high-performance computing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Anthony Skjellum,et al. MPI/FT: A Model-Based Approach to Low-Overhead Fault Tolerant Message-Passing Middleware , 2004, Cluster Computing.
[3] Marcos K. Aguilera,et al. Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication , 1997, WDAG.
[4] Jason Duell,et al. The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing , 2005, Int. J. High Perform. Comput. Appl..
[5] Wesley Bland,et al. User Level Failure Mitigation in MPI , 2012, Euro-Par Workshops.
[6] W. Jia,et al. Fault-tolerant scaleable multicast algorithm with piggybacking approach on logical process ring , 1998 .
[7] Jinsuk Chung,et al. Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems , 2012, HiPC 2012.
[8] James H. Laros,et al. Evaluating the viability of process replication reliability for exascale systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[9] Bruce Jacob,et al. The structural simulation toolkit , 2006, PERV.
[10] Thomas Naughton,et al. A Log-Scaling Fault Tolerant Agreement Algorithm for a Fault Tolerant MPI , 2011, EuroMPI.
[11] Andreas Reuter,et al. Transaction Processing: Concepts and Techniques , 1992 .
[12] Bianca Schroeder,et al. Understanding failures in petascale computers , 2007 .
[13] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[14] Greg Bronevetsky,et al. Run-Through Stabilization: An MPI Proposal for Process Fault Tolerance , 2011, EuroMPI.
[15] Robbert van Renesse,et al. A Gossip-Style Failure Detection Service , 2009 .