Exploiting virtual synchrony in distributed systems

We describe applications of a virtually synchronous environment for distributed programming, which underlies a collection of distributed programming tools in the ISIS2 system. A virtually synchronous environment allows processes to be structured into process groups, and makes events like broadcasts to the group as an entity, group membership changes, and even migration of an activity from one place to another appear to occur instantaneously — in other words, synchronously. A major advantage to this approach is that many aspects of a distributed application can be treated independently without compromising correctness. Moreover, user code that is designed as if the system were synchronous can often be executed concurrently. We argue that this approach to building distributed and fault-tolerant software is more straightforward, more flexible, and more likely to yield correct solutions than alternative approaches.

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[3]  Fred B. Schneider,et al.  Synchronization in Distributed Programs , 1982, TOPL.

[4]  Alfred Z. Spector,et al.  An algorithm, for replicated directories , 1983, PODC '83.

[5]  Philip A. Bernstein,et al.  The failure and recovery problem for replicated databases , 1983, PODC '83.

[6]  Dale Skeen Determining the last process to fail , 1983, PODS '83.

[7]  Andrew Birrell,et al.  Implementing remote procedure calls , 1984, TOCS.

[8]  Leslie Lamport,et al.  Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. , 1984, TOPL.

[9]  M. P. Herlihy REPLICATION METHODS FOR ABSTRACT DATA TYPES , 1984 .

[10]  Jo-Mei Chang,et al.  Reliable broadcast protocols , 1984, TOCS.

[11]  Robert E. Strom,et al.  Optimistic recovery in distributed systems , 1985, TOCS.

[12]  Eric C. Cooper Replicated distributed programs , 1985, SOSP 1985.

[13]  Marvin Theimer,et al.  Preemptable remote execution facilities for the V-system , 1985, SOSP '85.

[14]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[15]  Kenneth P. Birman,et al.  Replication and fault-tolerance in the ISIS system , 1985, SOSP '85.

[16]  Willy Zwaenepoel,et al.  Distributed process groups in the V Kernel , 1985, TOCS.

[17]  Özalp Babaoglu,et al.  Streets of Byzantium: Network Architectures for Fast Reliable Broadcasts , 1985, IEEE Transactions on Software Engineering.

[18]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[19]  David R. Cheriton,et al.  Preliminary thoughts on problem-oriented shared memory: a decentralized approach to distributed systems , 1985, OPSR.

[20]  E. B. Moss,et al.  Nested Transactions: An Approach to Reliable Distributed Computing , 1985 .

[21]  Kenneth P. Birman Replication and fault-tolerance in the ISIS system , 1985, SOSP 1985.

[22]  Eric C. Cooper Replicated distributed programs , 1985, SOSP '85.

[23]  Nicholas Carriero,et al.  The S/Net's Linda kernel , 1986, TOCS.

[24]  Kenneth P. Birman,et al.  Programming with Shared Bulletin Boards in Asynchronus Distributed Systems , 1986 .

[25]  Barbara Liskov,et al.  Highly available distributed services and fault-tolerant distributed garbage collection , 1986, PODC '86.

[26]  Kenneth P. Birman,et al.  Low cost management of replicated data in fault-tolerant distributed systems , 1986, TOCS.

[27]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[28]  Larry L. Peterson,et al.  Preserving Context Information in an IPC Abstraction , 1987, SRDS.

[29]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.