Relating Stabilizing Timing Assumptions to Stabilizing Failure Detectors Regarding Solvability and Efficiency

We investigate computational models with stabilizing properties. Such models include e.g. the partially synchronous model [Dwork et al. 1988], where after some unknown global stabilization time the system complies to bounds on computing speeds and message delays, or the asynchronous model augmented with unreliable failure detectors [Chandra et al. 1996], where after some unknown global stabilization time failure detectors stop making mistakes. Using algorithm transformations (a notion we introduce in this paper) we show that many (families of such) models are equivalent regarding solvability. We also analyze the efficiency of such transformations regarding not only the number of steps in a model M1 necessary to emulate a step in a model M2, but also the stabilization shift, which bounds the number of steps in M2 required to provide properties of M2 after the stabilization of M1.

[1]  Idit Keidar Challenges in evaluating distributed algorithms , 2003 .

[2]  Rachid Guerraoui,et al.  The overhead of consensus failure recovery , 2007, Distributed Computing.

[3]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[4]  Marcos K. Aguilera,et al.  On implementing omega with weak reliability and synchrony assumptions , 2003, PODC '03.

[5]  Marcos K. Aguilera,et al.  Communication-efficient leader election and consensus with limited link synchrony , 2004, PODC '04.

[6]  Rachid Guerraoui,et al.  How fast can eventual synchrony lead to consensus? , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[7]  Rachid Guerraoui,et al.  Synchronous system and perfect failure detector: Solvability and efficiency issues , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[8]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[9]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[10]  Danny Dolev,et al.  On the minimal synchronism needed for distributed consensus , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[11]  Felix C. Freiling,et al.  Failure Detection Sequencers: Necessary and Sufficient Information about Failures to Solve Predicate Detection , 2002, DISC.

[12]  André Schiper,et al.  The Heard-Of Model: Unifying all Benign Failures , 2006 .

[13]  Hagit Attiya,et al.  Wiley Series on Parallel and Distributed Computing , 2004, SCADA Security: Machine Learning Concepts for Intrusion Detection and Prevention.

[14]  Idit Keidar,et al.  Open Questions on Consensus Performance in Well-Behaved Runs , 2003, Future Directions in Distributed Computing.

[15]  Marcos K. Aguilera,et al.  Stable Leader Election , 2001, DISC.

[16]  Bernadette Charron-Bost,et al.  Simulating Reliable Links with Unreliable Links in the Presence of Process Crashes , 1996, WDAG.

[17]  Sébastien Tixeuil,et al.  Knowledge Connectivity vs. Synchrony Requirements for Fault-Tolerant Agreement in Unknown Networks , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[18]  Christof Fetzer,et al.  On the Possibility of Consensus in Asynchronous Systems with Finite Average Response Times , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[19]  Ben Y. Zhao,et al.  Future Directions in Distributed Computing , 2003, Lecture Notes in Computer Science.

[20]  Nicola Santoro,et al.  Time is Not a Healer , 1989, STACS.

[21]  Idit Keidar,et al.  Open Questions on Consensus Performance in Well-Behaved Runs , 2003, Future Directions in Distributed Computing.

[22]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[23]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[24]  Martin Biely,et al.  Optimal Message-Driven Implementation of Omega with Mute Processes , 2006, SSS.

[25]  Dahlia Malkhi,et al.  Chasing the Weakest System Model for Implementing Ω and Consensus , 2009, IEEE Transactions on Dependable and Secure Computing.

[26]  Dahlia Malkhi,et al.  Omega Meets Paxos: Leader Election and Stability Without Eventual Timely Links , 2005, DISC.

[27]  André Schiper,et al.  Communication Predicates: A High-Level Abstraction for Coping with Transient and Dynamic Faults , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[28]  F. Mattern On the Relativistic Structure of Logical Time in Distributed Systems , 2009 .