Cluster Survivability with ByzwATCh: A Byzantine Hardware Fault Detector for Parallel Machines with Charm++
暂无分享,去创建一个
[1] Michael K. Reiter,et al. Fault detection for Byzantine quorum systems , 1999, Dependable Computing for Critical Applications 7.
[2] Laxmikant V. Kalé,et al. Adaptive MPI , 2003, LCPC.
[3] Michael Nicolaidis,et al. Embedded robustness IPs for transient-error-free ICs , 2002, IEEE Design & Test of Computers.
[4] Laxmikant V. Kale,et al. Proactive Fault Tolerance in Large Systems , 2004 .
[5] Cristian Constantinescu,et al. Impact of deep submicron technology on dependability of VLSI circuits , 2002, Proceedings International Conference on Dependable Systems and Networks.
[6] G. Robert Redinbo,et al. Fault-tolerant FFT data compression , 2000, Proceedings. 2000 Pacific Rim International Symposium on Dependable Computing.
[7] Neeraj Suri,et al. Advances in ULTRA-Dependable Distributed Systems , 1994 .
[8] Sean Keller,et al. SafeMPI - Extending MPI for Byzantine Error Detection on Parallel Clusters , 2005, ArXiv.
[9] Håkan Sivencrona,et al. Byzantine Fault Tolerance, from Theory to Reality , 2003, SAFECOMP.
[10] Thomas Hérault,et al. MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[11] Tracy Larrabee,et al. Beyond the byzantine generals: unexpected behavior and bridging fault diagnosis , 1996, Proceedings International Test Conference 1996. Test and Design Validity.
[12] Christian Engelmann,et al. Development of Naturally Fault Tolerant Algorithms for Computing on 100,000 Processors , 2002 .
[13] Leslie Lamport,et al. The Byzantine Generals Problem , 1982, TOPL.
[14] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[15] Michael Treaster,et al. A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems , 2004, ArXiv.
[16] Laxmikant V. Kalé,et al. A fault tolerant protocol for massively parallel systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[17] Douglas M. Blough,et al. Fault-injection-based testing of fault-tolerant algorithms in message-passing parallel computers , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.