Chapter 2 Replication Techniques for Availability

The chapter studies how to provide clients with access to a replicated object that is logically indistinguishable from accessing a single yet highly available object. We study this problem under two different models. In the first, we assume that failures can be detected accurately. In the second we drop this assumption, making the model more realistic but also significantly more challenging. Under the first model, we present the primary-backup and chain replication techniques. Under the second model, we present techniques based on voting. We conclude with a discussion on reconfiguration.

[1]  Fred B. Schneider,et al.  The primary-backup approach , 1993 .

[2]  Barbara Liskov,et al.  Viewstamped Replication: A General Primary Copy , 1988, PODC.

[3]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[4]  Idit Keidar,et al.  Group communication specifications: a comprehensive study , 2001, CSUR.

[5]  M. Herlihy A quorum-consensus replication method for abstract data types , 1986, TOCS.

[6]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.

[7]  Moni Naor,et al.  The Load, Capacity, and Availability of Quorum Systems , 1998, SIAM J. Comput..

[8]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[9]  Leslie Lamport,et al.  Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. , 1984, TOPL.

[10]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[11]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[12]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[13]  Kenneth P. Birman,et al.  Exploiting virtual synchrony in distributed systems , 1987, SOSP '87.

[14]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[15]  Flaviu Cristian,et al.  An efficient, fault-tolerant protocol for replicated data management , 1985, PODS '85.

[16]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[17]  Sam Toueg,et al.  Unreliable Failure Detectors for Asynchronous Systems , 1991 .

[18]  Hector Garcia-Molina,et al.  How to assign votes in a distributed system , 1985, JACM.