The Design and Implementation of a Fault-Tolerant Cluster Manager
暂无分享,去创建一个
[1] Priya Narasimhan,et al. The Eternal system: an architecture for enterprise applications , 1999, Proceedings Third International Enterprise Distributed Object Computing. Conference (Cat. No.99EX366).
[2] William H. Sanders,et al. AQuA: an adaptive architecture that provides dependable distributed objects , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).
[3] Charles L. Seitz,et al. Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.
[4] Jonathan Robinson,et al. Hector: an agent based architecture for dynamic resource management , 1999, IEEE Concurr..
[5] Amnon Barak,et al. The MOSIX multicomputer operating system for high performance cluster computing , 1998, Future Gener. Comput. Syst..
[6] Miron Livny,et al. Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.
[7] Robbert van Renesse,et al. Horus: a flexible group communication system , 1996, CACM.
[8] Jingwen Wang,et al. Utopia: A load sharing facility for large, heterogeneous distributed computer systems , 1993, Softw. Pract. Exp..
[9] Yuval Tamir,et al. FAULT-TOLERANT CLUSTER MANAGEMENT FOR RELIABLE HIGH-PERFORMANCE COMPUTING , 2001 .
[10] P. A. Barrett. Delta-4: an open architecture for dependable systems , 1993 .
[11] Flaviu Cristian,et al. A performance comparison of asynchronous atomic broadcast protocols , 1994, Distributed Syst. Eng..
[12] Louise E. Moser,et al. The Totem system , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[13] Danny Dolev,et al. The Transis approach to high availability cluster communication , 1996, CACM.
[14] Amin Vahdat,et al. GLUix: a global layer unix for a network of workstations , 1998, Softw. Pract. Exp..
[15] P. Reynier,et al. Active replication in Delta-4 , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.
[16] Miguel Castro,et al. Proactive recovery in a Byzantine-fault-tolerant system , 2000, OSDI.
[17] Kenneth P. Birman,et al. The process group approach to reliable distributed computing , 1992, CACM.
[18] Geoffrey C. Fox,et al. A Review of Commercial and Research Cluster Management Software , 1996 .
[19] Henri E. Bal,et al. An efficient reliable broadcast protocol , 1989, OPSR.
[20] Leslie Lamport,et al. The Byzantine Generals Problem , 1982, TOPL.
[21] Ravishankar K. Iyer,et al. Chameleon: A Software Infrastructure for Adaptive Fault Tolerance , 1999, IEEE Trans. Parallel Distributed Syst..
[22] Fred B. Schneider,et al. Byzantine generals in action: implementing fail-stop processors , 1984, TOCS.