On the weakest failure detector ever

Many problems in distributed computing are impossible when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is needed to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet non-trivial failure informatio. We present an abstraction, denoted Υ, that provides very little failure information. In every run of the distributed system, Υ eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period of time, and it only excludes one possibility of correct set among many, Υ still captures non-trivial failure information. We show that Υ is sufficient to circumvent the fundamental wait-free set-agreement impossibility. While doing so, we (a) disprove previous conjectures about the weakest failure detector to solve set-agreement and we (b) prove that solving set-agreement with registers is strictly weaker than solving n+1-process consensus using n-process consensus. We prove that Υ is, in a precise sense, minimal to circumvent any wait-free impossibility. Roughly, we show that Υ is the weakest eventually stable failure detect or to circumvent any wait-free impossibility. Our results are generalized through an abstraction Υf that we introduce and prove necessary to solve any problem that cannot be solved in an f-resilient manner, and yet sufficient to solve f-resilient f-set-agreement.

[1]  Rachid Guerraoui,et al.  Failure detectors as type boosters , 2007, Distributed Computing.

[2]  Gil Neiger,et al.  Failure Detectors and the Wait-Free Hierarchy. , 1995, ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing.

[3]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[4]  Piotr Zielinski Automatic Classification of Eventual Failure Detectors , 2007, DISC.

[5]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[6]  Sam Toueg,et al.  Wait-freedom vs. t-resiliency and the robustness of wait-free hierarchies (extended abstract) , 1994, PODC '94.

[7]  Eli Gafni,et al.  Generalized FLP impossibility result for t-resilient asynchronous computations , 1993, STOC.

[8]  Prasad Jayanti,et al.  Robust wait-free hierarchies , 1997, JACM.

[9]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[10]  Soma Chaudhuri,et al.  More Choices Allow More Faults: Set Consensus Problems in Totally Asynchronous Systems , 1993, Inf. Comput..

[11]  Eli Gafni,et al.  Structured derivations of consensus algorithms for failure detectors , 1998, PODC '98.

[12]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[13]  Achour Mostéfaoui,et al.  Exploring Gafni's Reduction Land: From Omegak to Wait-Free Adaptive (2p-[p/k])-Renaming Via k-Set Agreement , 2006, DISC.

[14]  Wei Chen,et al.  Weakening Failure Detectors for k -Set Agreement Via the Partition Approach , 2007, DISC.

[15]  Maurice Herlihy,et al.  The asynchronous computability theorem for t-resilient tasks , 1993, STOC.

[16]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1996, JACM.

[17]  Rachid Guerraoui,et al.  (Almost) All Objects Are Universal in Message Passing Systems , 2005, DISC.

[18]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[19]  Maurice Herlihy,et al.  Subconsensus Tasks: Renaming Is Weaker Than Set Agreement , 2006, DISC.

[20]  Michel Raynal,et al.  In Search of the Holy Grail: Looking for the Weakest Failure Detector for Wait-Free Set Agreement , 2006, OPODIS.

[21]  Michael E. Saks,et al.  Wait-free k-set agreement is impossible: the topology of public knowledge , 1993, STOC.

[22]  Danny Dolev,et al.  On the minimal synchronism needed for distributed consensus , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[23]  Rachid Guerraoui,et al.  On Failure Detectors and Type Boosters , 2003, DISC.

[24]  Gil Neiger Failure detectors and the wait-free hierarchy (extended abstract) , 1995, PODC '95.

[25]  Rachid Guerraoui,et al.  Mutual exclusion in asynchronous systems with failure detectors , 2005, J. Parallel Distributed Comput..

[26]  Piotr Zielinski Anti-Ω: the weakest failure detector for set agreement , 2008, PODC '08.

[27]  Rachid Guerraoui,et al.  The weakest failure detectors to solve certain fundamental problems in distributed computing , 2004, PODC '04.