Proactive Fault Tolerance Using Preemptive Migration
暂无分享,去创建一个
Christian Engelmann | Stephen L. Scott | Thomas Naughton | Geoffroy Vallée | T. Naughton | S. Scott | C. Engelmann | G. Vallée
[1] Bianca Schroeder,et al. Understanding failures in petascale computers , 2007 .
[2] Jon Stearley,et al. Bad Words: Finding Faults in Spirit's Syslogs , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).
[3] Laxmikant V. Kalé,et al. Proactive Fault Tolerance in MPI Applications Via Task Migration , 2006, HiPC.
[4] Christian Engelmann,et al. A Framework for Proactive Fault Tolerance , 2008, 2008 Third International Conference on Availability, Reliability and Security.
[5] David E. Culler,et al. The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..
[6] Bert J. Debusschere,et al. Ovis-2: A robust distributed architecture for scalable RAS , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[7] Christian Engelmann,et al. Proactive fault tolerance for HPC with Xen virtualization , 2007, ICS '07.
[8] B.P. Miller,et al. MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[9] Cheng-Zhong Xu,et al. Exploring event correlation for failure prediction in coalitions of clusters , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[10] Cong Du,et al. MPI-Mitten: Enabling Migration Technology in MPI , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).
[11] Christian Engelmann,et al. Proactive process-level live migration in HPC environments , 2008, HiPC 2008.
[12] Stephen L. Scott,et al. Evaluation of fault-tolerant policies using simulation , 2007, 2007 IEEE International Conference on Cluster Computing.
[13] Zhiling Lan,et al. Towards a Fault-aware Computing Environment , 2008 .