论文信息 - The effects of link failures on computations in asynchronous rings

The effects of link failures on computations in asynchronous rings

We investigate the message complexity of distributed computations on rings of asynchronous processors. In such computations, each processor has an initial local value and the task is to compute some predetermined function of all local values. Our work deviates from the traditional approach to complexity of ring computations in that we consider the effect of link failures. We show that the complexity of any non-trivial function is O(n log n ) messages when n , the number of processors, is a-priori known; and is O(n 2) when n is not known. Interestingly, these tight bounds do not depend on whether the identity of a leader is apriori known before the computation starts. These results stand in sharp contrast to the situation in an asynchronous ring with no link failures~ *) On leave from the Computer Science Dept., Teehnion, Hails, Israel. Partially supported by a Weizmann Postdoe. Fellowship, an IBM Postdoe. Fellowship, and NSF Grant DCR-8509905. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specfic permission. 1. I N T R O D U C T I O N Much attention has been recently concentrated in modeling and analyzing the behaviour of communication networks. Message transmission delays in these networks are usually unpredictable and hard to quantify. Thus, communication networks are commonly modeled as asynchronous. Another disturbing phenomena is that links may fail, resulting in the loss of messages. It is therefore reasonable to introduce into the model the notion of link failures. This may be done by postulating that a message sent on a (non-faulty) link will eventually arrive; while a message sent on a faulty link may not arrive at all. Since there is no bound on transmission delays, there is no way to distinguish a message which has not yet arrived from a message which has been lost. Thus we postulate that it is impossible to detect a link failure during the execution of a distributed algorithm. (This fault model originates from [SR] gnd [IR2].) Developing and proving correctness of algorithms in the above framework is fairly involved. A helpful methodology © 1986 A C M 0 8 9 7 9 1 1 9 8 9 / 8 6 / 0 8 0 0 0 1 7 4 75¢ t 7 4 consists of using high level abstractions such as Global Computation [SR]. Global Computation is the task of computing a predetermined function of values which are scattered among the network processors. That is, initially each processor contains a local value and the task is to reach a situation in which each processor stores the value of the function applied to all the local values. Typical examples of useful functions are MAXIMUM, AND, XOR etc. The task (of Global Computation) can either be initiated by a distinct processor (the leader) or be initiated distributively by any single processor (or group of uncordinated processors). In this paper we study the message complexity of implementing Global Computation in the above communication model. We concentrate on networks with ring configuration. Such networks are of special interest for applications, and may also serve as a first step towards dealing with the problem in its full generality. Ring configurations were considered in the study of various problems regarding distributed algorithms (e.g. ILl, [B], [IR1], [M], [FL], IV], [ASW], [CRI, Ins], IF], [DKR] and [P]). We explore the effect of u-priori common knowledge, such as the ring size and/or the identity of a leader, on the complexity of implementing Global Computation. We show that in the presence of link failures the complexity does not depend on a-priori knowledge of a leader, but rather depend on whether the size of the ring is u-priori 175 known. These results are surprising when compared to the situation in the asynchronous model with no link failures. When there are no faults, Global Computation is much easier with the network having a leader, and does not become easier with a-priori knowledge of the size of the ring. We demonstrate non-linear lower bounds on the message complexity of Global Computation in the presence of link failures. For the lower bounds, we assume that a leader exists and is apriori known to all processors. We consider two cases. First we assume that each processor knows, n , the number of processors in the ring. In this case, we show that Global Computation requires ~(nlog n ) messages. Next we show that in case n is not a-priori known, Global Computation requires f~(n2) messages. The first lower bound refers also to algorithms which are designed to operate only on a ring of particular size; while the second bound refers only to algorithms which must operate on any ring. We demonstrate the tightness of both lower bounds, without assuming that a leader exist a-priori. In particular, we present leader election algorithms for both cases (n known and n unknown) with message complexity O(n.log n ) and O(n 2) respectively. Our results concerning the complexity of global computation in the presence of link failures are tabulated below. The table also confronts our results with the known results for asynchronous rings without link failures. LINK FAILURES (our results)

Oded Goldreich | Liuba Shrira | Oded Goldreich | L. Shrira

[1] Nancy A. Lynch,et al. The impact of synchronous communication on the problem of electing a leader in a ring , 1984, STOC '84.

[2] Michael Merritt. Elections in the presence of faults , 1984, PODC '84.

[3] Gérard Le Lann,et al. Distributed Systems - Towards a Formal Approach , 1977, IFIP Congress.

[4] Alon Itai,et al. The Multi-Tree Approach to Reliability in Distributed Networks , 1988, Inf. Comput..

[5] Daniel S. Hirschberg,et al. Decentralized extrema-finding in circular configurations of processors , 1980, CACM.

[6] Gary L. Peterson,et al. An O(nlog n) Unidirectional Algorithm for the Circular Extrema Problem , 1982, TOPL.

[7] Baruch Awerbuch,et al. Efficient and reliable broadcast is achievable in an eventually connected network(Extended Abstract) , 1984, PODC '84.

[8] Hagit Attiya,et al. Computing on an anonymous ring , 1988, JACM.

[9] Nancy A. Lynch,et al. On Describing the Behavior and Implementation of Distributed Systems , 1979, Semantics of Concurrent Computation.

[10] Paul M. B. Vitányi. Distributed elections in an archimedean ring of processors , 1984, STOC '84.

[11] Ernest J. H. Chang,et al. An improved algorithm for decentralized extrema-finding in circular configurations of processes , 1979, CACM.

[12] W. Randolph Franklin. On an improved algorithm for decentralized extrema finding in circular configurations of processors , 1982, CACM.

[13] Michael J. Fischer,et al. The Consensus Problem in Unreliable Distributed Systems (A Brief Survey) , 1983, FCT.