Scalable self-stabilization

The paper presents a method by which an asynchronous non-reactive distributed system can stabilize from a k-faulty configuration in a time that is a monotonically increasing function of k and independent of the size of the system. In the proposed methodology processes first measure the size of the faulty regions, and then use this information to schedule actions in such a way that the faulty regions progressively shrink, until they completely disappear. When k contiguous processes fail, the stabilization time is O(k/sup 3/). Otherwise, for small values of k, the stabilization time can be exponential in k, but it has an upper bound of O(n/sup 3/). The added space complexity per process is O(/spl delta/ log/sub 2/n), where /spl delta/ is the maximum degree of a node.