Building Single Fault Survivable Parallel Algorithms for Matrix Operations Using Redundant Parallel Computation
暂无分享,去创建一个
Xuejun Yang | Jia Jia | Haifang Zhou | Yunfei Du | Hongyi Fu | Panfeng Wang
[1] Christian Engelmann,et al. Development of Naturally Fault Tolerant Algorithms for Computing on 100,000 Processors , 2002 .
[2] Kai Li,et al. Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..
[3] Jacob A. Abraham,et al. Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.
[4] William Gropp,et al. Dynamic process management in an MPI setting , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.
[5] George Bosilca,et al. Recovery Patterns for Iterative Methods in a Parallel Unstable Environment , 2007, SIAM J. Sci. Comput..
[6] Zizhong Chen,et al. Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[7] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .
[8] George Bosilca,et al. Fault tolerant high performance computing by a coding approach , 2005, PPoPP.