Primary-Backup Protocols: Lower Bounds and Optimal Implementations

We present a precise specification of the primary-backup approach. Then, for a variety of different failure models we prove lower bounds on the degree of replication, failover time, and worst-case blocking time for client requests. Finally, we outline primary-backup protocols and indicate which of our lower bounds are tight.

[1]  Michael Williams,et al.  Replication in the harp file system , 1991, SOSP '91.

[2]  Sam Toueg,et al.  Distributed agreement in the presence of processor and communication faults , 1986, IEEE Transactions on Software Engineering.

[3]  Fred B. Schneider,et al.  Optimal Primary-Backup Protocols , 1992, WDAG.

[4]  Kenneth P. Birman,et al.  Reliable broadcast protocols , 1990 .

[5]  Kenneth P. Birman,et al.  Exploiting virtual synchrony in distributed systems , 1987, SOSP '87.

[6]  P. M. Melliar-Smith,et al.  Synchronizing clocks in the presence of faults , 1985, JACM.

[7]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[8]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[9]  Joel F. Bartlett,et al.  A NonStop kernel , 1981, SOSP.

[10]  Anupam Bhide,et al.  A Highly Available Network File Server , 1991, USENIX Winter.

[11]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.

[12]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[13]  Gil Neiger,et al.  Automatically increasing the fault-tolerance of distributed systems , 1988, PODC '88.

[14]  Timothy P. Mann,et al.  An Algorithm for Data Replication , 1989 .

[15]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[16]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[17]  Özalp Babaoglu,et al.  Streets of Byzantium: Network Architectures for Fast Reliable Broadcasts , 1985, IEEE Transactions on Software Engineering.