High availability for parallel computers
暂无分享,去创建一个
[1] Thomas Hérault,et al. MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI , 2006, Int. J. High Perform. Comput. Appl..
[2] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .
[3] Emilio Luque,et al. An Intelligent Management of Fault Tolerance in Cluster Using RADICMPI , 2006, PVM/MPI.
[4] V. Rajaraman,et al. A survey of checkpointing algorithms for parallel and distributed computers , 2000 .
[5] Emilio Luque,et al. Increasing the cluster availability using RADIC , 2006, 2006 IEEE International Conference on Cluster Computing.
[6] Emilio Luque Fadón,et al. Outcomes of the fault tolerance configuration , 2009 .
[7] Paolo Faraboschi,et al. COTSon: infrastructure for full system simulation , 2009, OPSR.
[8] Emilio Luque,et al. Challenges and Issues of the Integration of RADIC into Open MPI , 2009, PVM/MPI.
[9] Laxmikant V. Kale,et al. Proactive Fault Tolerance in Large Systems , 2004 .
[10] Christian Engelmann,et al. Development of Naturally Fault Tolerant Algorithms for Computing on 100,000 Processors , 2002 .
[11] Emilio Luque,et al. Providing Non-stop Service for Message-Passing Based Parallel Applications with RADIC , 2008, Euro-Par.
[12] Richard P. Martin,et al. Quantifying the performability of cluster-based services , 2005, IEEE Transactions on Parallel and Distributed Systems.
[13] William Gropp,et al. Fault Tolerance in Message Passing Interface Programs , 2004, Int. J. High Perform. Comput. Appl..
[14] Joel S. Emer,et al. The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.