Reliable broadcast for fault-tolerance on local computer networks

The authors discuss the definition and design of a generic reliable communication architecture on a widely used host-independent platform, such as a local area network (LAN). Two relevant aspects are the use of nonreplicated LANs and self-checking components. The protocol is innovative, in the sense that, although clockless and running on a nonreplicated network, it displays bounded execution times. Thus the architecture is capable of reliably addressing realtime. Support of high-performance real-time applications with this architecture is being seriously considered in the present phase of project Delta-4, a CED Esprit II consortium designing an open, dependable distributed architecture. The authors' considerations regarding synchronism properties of clockless protocols are being applied in this context.<<ETX>>

[1]  Paul D. Ezhilchelvan,et al.  Fail-controlled processor architectures for distributed systems , 1990 .

[2]  Jean Arlat,et al.  Fault Injection for Dependability Validation: A Methodology and Some Applications , 1990, IEEE Trans. Software Eng..

[3]  Özalp Babaoglu,et al.  Streets of Byzantium: Network Architectures for Fast Reliable Broadcasts , 1985, IEEE Transactions on Software Engineering.

[4]  Greg Chesson,et al.  XTP/PE overview , 1988, Proceedings [1988] 13th Conference on Local Computer Networks.

[5]  Paulo Veríssimo,et al.  Reliable Multicasting in High-speed LANs , 1991 .

[6]  Paulo Veríssimo,et al.  AMp: a highly parallel atomic multicast protocol , 1989, SIGCOMM '89.

[7]  P.M. Melliar-Smith,et al.  Fault-tolerant distributed systems based on broadcast communication , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[8]  Paulo Veríssimo,et al.  The Delta-4 approach to dependability in open distributed computing systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[9]  John F. Wakerly,et al.  Error detecting codes, self-checking circuits and applications , 1978 .

[10]  Jo-Mei Chang,et al.  Reliable broadcast protocols , 1984, TOCS.

[11]  Nancy A. Lynch,et al.  Reliable broadcast in networks with nonprogrammable servers , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[12]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[13]  Samuel T. Chanson,et al.  Reliable group communication in distributed systems , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[14]  Fred B. Schneider,et al.  Byzantine generals in action: implementing fail-stop processors , 1984, TOCS.

[15]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[16]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[17]  Paulo Veríssimo Redundant media mechanisms for dependable communication in token-bus LANs , 1988, Proceedings [1988] 13th Conference on Local Computer Networks.