Probabilistic and temporal failure detectors for solving distributed problems

Abstract Failure detectors (FD)s are celebrated for their modularity in solving distributed problems. Algorithms are constructed using FD building blocks. Synchrony assumptions to implement FDs are studied separately and are typically expressed as eventual guarantees that need to hold, after some point in time, forever and deterministically. But in practice, they may hold only probabilistically and temporarily. This paper studies FDs in a realistic system N , where asynchrony is inflicted by probabilistic synchronous communication. We first address a problem with ⋄ S , the weakest FD to solve consensus: an implementation of “consensus with probability 1” is possible in N without randomness in the algorithm, while an implementation of “ ⋄ S with probability 1” is impossible in N . We introduce ⋄ S ⁎ , a new FD with probabilistic and temporal accuracy. We prove that ⋄ S ⁎ (i) is implementable in N and (ii) can replace ⋄ S , in several existing deterministic consensus algorithms that use ⋄ S , to yield an algorithm that solves “consensus with probability 1”. We extend our results to other FD classes, e.g., ⋄ P , and to a larger set of problems (beyond consensus), which we call decisive problems.