Probabilistic Fault-Containment

Research on fine tuning stabilization properties has received attention for nearly a decade. This paper presents a probabilistic algorithm for fault-containment, that confines the effect of any single fault to the immediate neighborhood of the faulty process, with an expected recovery time of O(Δ3). The most significant aspect of the algorithm is that the fault-gap, defined as the smallest interval after which the system is ready to handle the next single fault with the same efficiency, depends only on Δ, and is independent of the network size. We argue that a small fault-gap increases the availability of the fault-free system.