Early stopping in global data computation

The Global Data Computation problem consists of providing each process with the same vector (with one entry per process) such that each entry is filled by a value provided by the corresponding process. This paper presents a protocol that solves this problem in an asynchronous distributed system where processes can crash, but equipped with a perfect failure detector. This protocol requires that processes execute asynchronous computation rounds. The number of rounds is upper bounded by minðf þ 2; tþ 1; nÞ, where n, t, and f represent the total number of processes, the maximum number of processes that can crash, and the number of processes that actually crash, respectively. This value is a lower bound for the number of rounds when t < nÿ 1. To our knowledge, this protocol is the first to achieve this lower bound. Interestingly, this protocol meets the same lower bound as the one required in synchronous systems.

[1]  Gil Neiger,et al.  Automatically Increasing the Fault-Tolerance of Distributed Algorithms , 1990, J. Algorithms.

[2]  Achour Mostéfaoui,et al.  Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors , 2000, IEEE Trans. Parallel Distributed Syst..

[3]  T. Porsching,et al.  Numerical Analysis of Partial Differential Equations , 1990 .

[4]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[5]  Rida A. Bazzi,et al.  Simplifying fault-tolerance: providing the abstraction of crash failures , 2001, JACM.

[6]  Eli Gafni,et al.  Round-by-Round Fault Detectors: Unifying Synchrony and Asynchrony (Extended Abstract). , 1998, PODC 1998.

[7]  Rachid Guerraoui,et al.  Synchronous system and perfect failure detector: Solvability and efficiency issues , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[8]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[9]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[10]  Martin de Prycker,et al.  Asynchronous Transfer Mode, Solution for Broadband Isdn , 1991 .

[11]  Danny Dolev,et al.  Early stopping in Byzantine agreement , 1990, JACM.

[12]  Carole Delporte-Gallet,et al.  Early Stopping in Global Data Computation , 2003, IEEE Trans. Parallel Distributed Syst..

[13]  Marcel-Catalin Rosu Early-stopping Terminating Reliable Broadcast protocol for general-omission failures , 1996, PODC '96.

[14]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[15]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[16]  Marcin Paprzycki,et al.  Distributed Computing: Fundamentals, Simulations and Advanced Topics , 2001, Scalable Comput. Pract. Exp..

[17]  Sam Toueg,et al.  Fast Distributed Agreement , 1987, SIAM J. Comput..