Resilient Objects in Broadcast Networks

An object is said to be resilient if operations on the object can be performed even if some nodes of the network fail. To support resiliency, copies of the objects are stored on different nodes, and access to different copies is coordinated. The properties of broadcast networks are utilized to devise a distributed scheme for implementing resilient objects. All the copies of an object are equivalent. If an operation is requested on an object, the operation is performed on all the copies of the object. No special mechanisms are needed if some copies are not available due to node failures, as long as there is at least one active node that has a copy of the object and the network does not get partitioned. Simulation results indicate that the number of messages needed to perform an operation increases slowly and the response time for performing an operation decreases as the number of copies increases. >

[1]  Dale Skeen Determining the last process to fail , 1983, PODS '83.

[2]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[3]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[4]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[5]  Bruce Walker,et al.  The LOCUS distributed operating system , 1983, SOSP '83.

[6]  Eric C. Cooper Replicated distributed programs , 1985, SOSP '85.

[7]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[8]  Richard D. Schlichting,et al.  Fail-Stop Processors: An Approach to Designing Computing Systems , 1983 .

[9]  Irving L. Traiger,et al.  The notions of consistency and predicate locks in a database system , 1976, CACM.

[10]  Amr El Abbadi,et al.  Implementing Fault-Tolerant Distributed Objects , 1985, IEEE Transactions on Software Engineering.

[11]  Luigi V. Mancini Modular redundancy in a message passing system , 1986, IEEE Transactions on Software Engineering.

[12]  Narain H. Gehani,et al.  Broadcasting Sequential Processes (BSP) , 1984, IEEE Transactions on Software Engineering.

[13]  J. Goldberg,et al.  SIFT: Design and analysis of a fault-tolerant computer for aircraft control , 1978, Proceedings of the IEEE.

[14]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[15]  Gordon Bell,et al.  Ethernet: Distributed Packet Switching for Local Computer Networks , 1976 .

[16]  Stephen N. Zilles,et al.  Specification techniques for data abstractions , 1975 .

[17]  Jo-Mei Chang,et al.  Reliable broadcast protocols , 1984, TOCS.