Byzantine Anomaly Testing for Charm++: Providing Fault Tolerance and Survivability for Charm++ Empowered Clusters
暂无分享,去创建一个
[1] Laxmikant V. Kale,et al. Proactive Fault Tolerance in Large Systems , 2004 .
[2] Michael Treaster,et al. A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems , 2004, ArXiv.
[3] R. Dixon,et al. The n-queens problem , 1975, Discret. Math..
[4] Laxmikant V. Kalé,et al. Adaptive MPI , 2003, LCPC.
[5] Michael Nicolaidis,et al. Embedded robustness IPs for transient-error-free ICs , 2002, IEEE Design & Test of Computers.
[6] Seth Copen Goldstein,et al. Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.
[7] G. Robert Redinbo,et al. Fault-tolerant FFT data compression , 2000, Proceedings. 2000 Pacific Rim International Symposium on Dependable Computing.
[8] Neeraj Suri,et al. Advances in ULTRA-Dependable Distributed Systems , 1994 .
[9] Thomas Hérault,et al. MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[10] Tracy Larrabee,et al. Beyond the byzantine generals: unexpected behavior and bridging fault diagnosis , 1996, Proceedings International Test Conference 1996. Test and Design Validity.
[11] Sean Keller,et al. SafeMPI - Extending MPI for Byzantine Error Detection on Parallel Clusters , 2005, ArXiv.
[12] Michael K. Reiter,et al. Fault detection for Byzantine quorum systems , 1999, Dependable Computing for Critical Applications 7.
[13] Cristian Constantinescu,et al. Impact of deep submicron technology on dependability of VLSI circuits , 2002, Proceedings International Conference on Dependable Systems and Networks.
[14] Håkan Sivencrona,et al. Byzantine Fault Tolerance, from Theory to Reality , 2003, SAFECOMP.
[15] Christian Engelmann,et al. Development of Naturally Fault Tolerant Algorithms for Computing on 100,000 Processors , 2002 .
[16] Leslie Lamport,et al. The Byzantine Generals Problem , 1982, TOPL.
[17] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[18] Douglas M. Blough,et al. Fault-injection-based testing of fault-tolerant algorithms in message-passing parallel computers , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.