论文信息 - Optimal Recovery Schemes for High-Availability Cluster and Distributed Computing

Optimal Recovery Schemes for High-Availability Cluster and Distributed Computing

Clusters and distributed systems offer two important advantages, viz. fault tolerance and high performance through load sharing. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers break down the load on these computers must be redistributed to other computers in the cluster. The redistribution is determined by the recovery scheme. The recovery scheme should keep the load as evenly distributed as possible even when the most unfavorable combinations of computers break down, i.e., we want to optimize the worst-case behavior. In this paper we define recovery schemes, which are optimal for a number of important cases. We also define a bound on the performance of the recovery schemes for any number of computers.

Lars Lundberg | Charlie Svahnberg | L. Lundberg | Charlie Svahnberg

[1] Arun Chandra,et al. Evaluating HACMP/6000: a clustering solution for high availability distributed systems , 1994, Proceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems.

[2] Gregory F. Pfister,et al. In Search of Clusters , 1995 .