Solving Problems in the Presence of Process Crashes and Lossy Links

We study the effect of link failures on the solvability of problems in asynchronous systems that are subject to process crashes: given a problem that can be solved in a system with process crashes and reliable links, is the problem solvable even if links are lossy? We answer this question for two types of lossy links, and show that the answer depends on the maximum number of processes that may crash and the nature of the problem to be solved. In particular, we prove that the answer is positive if fewer than half of the processes may crash or if the problem specification does not refer to the state of processes that crash. However, in general, the answer is negative even if each link can loose only a finite number of messages.

[1]  Hagit Attiya,et al.  Renaming in an asynchronous environment , 1990, JACM.

[2]  Ajei Sarat Gopal Fault-tolerant broadcasts and multicasts: the problem of inconsistency and contamination , 1992 .

[3]  Da-Wei Wang,et al.  Tight bounds for the sequence transmission problem , 1989, PODC '89.

[4]  Baruch Awerbuch,et al.  A quantitative approach to dynamic networks , 1990, PODC '90.

[5]  Yehuda Afek,et al.  End-to-end communication in unreliable networks , 1988, PODC '88.

[6]  George Varghese,et al.  Crash failures can drive protocols to arbitrary states , 1996, PODC '96.

[7]  Sam Toueg,et al.  A Modular Approach to Fault-Tolerant Broadcasts and Related Problems , 1994 .

[8]  Baruch Awerbuch,et al.  Reliable broadcast protocols in unreliable networks , 1986, Networks.

[9]  Gil Neiger,et al.  Automatically Increasing the Fault-Tolerance of Distributed Algorithms , 1990, J. Algorithms.

[10]  Keith A. Bartlett,et al.  A note on reliable full-duplex transmission over half-duplex links , 1969, Commun. ACM.

[11]  Rida A. Bazzi,et al.  Simulating crash failures with many faulty processors , 1992 .

[12]  Leslie Lamport,et al.  What Good is Temporal Logic? , 1983, IFIP Congress.

[13]  Nancy A. Lynch,et al.  Reliable communication over unreliable channels , 1994, JACM.

[14]  Nancy A. Lynch,et al.  The impossibility of implementing reliable communication in the face of crashes , 1993, JACM.

[15]  Nancy A. Lynch,et al.  Reaching approximate agreement in the presence of faults , 1986, JACM.

[16]  Kenneth P. Birman,et al.  Replication and fault-tolerance in the ISIS system , 1985, SOSP '85.

[17]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[18]  Soma Chaudhuri,et al.  Agreement is harder than consensus: set consensus problems in totally asynchronous systems , 1990, PODC '90.

[19]  Kenneth P. Birman Replication and fault-tolerance in the ISIS system , 1985, SOSP 1985.