AUTOMATIC REPLICATION FOR HIGHLY AVAILABLE SERVICES

Replicating various components of a system is a common technique for providing highly available services in the presence of failures. A replication scheme is a mechanism for organizing these replicas so that as a group they provide a service that has the same semantics as the original unreplicated service. Viewstamped replication is a new replication scheme for providing high availability. This thesis describes an implementation of viewstamped replication in the context of the Argus programming language and run-time system. The programmer writes an Argus program to provide a service without worrying about availability. The run-time system automatically replicates the service using the viewstamped replication scheme, and therefore makes the service highly available. Performance measurements indicate that this method allows a program to be made highly available without degradation of performance.

[1]  Timothy P. Mann,et al.  An Algorithm for Data Replication , 1989 .

[2]  M. Herlihy A quorum-consensus replication method for abstract data types , 1986, TOCS.

[3]  E. B. Moss,et al.  Nested Transactions: An Approach to Reliable Distributed Computing , 1985 .

[4]  Kenneth P. Birman,et al.  Replication and fault-tolerance in the ISIS system , 1985, SOSP '85.

[5]  Joel F. Bartlett,et al.  A NonStop kernel , 1981, SOSP.

[6]  Flaviu Cristian,et al.  An efficient, fault-tolerant protocol for replicated data management , 1985, Fault-Tolerant Distributed Computing.

[7]  D. J. Hwang,et al.  CONSTRUCTING A HIGHLY-AVAILABLE LOCATION SERVICE FOR A DISTRIBUTED ENVIRONMENT , 1988 .

[8]  Butler W. Lampson,et al.  Atomic Transactions , 1980, Advanced Course: Distributed Systems.

[9]  Eric C. Cooper Replicated distributed programs , 1985, SOSP '85.

[10]  Barbara Liskov,et al.  Implementation of Argus , 1987, SOSP '87.

[11]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[12]  Kenneth P. Birman Replication and fault-tolerance in the ISIS system , 1985, SOSP 1985.

[13]  B. M. Oki,et al.  VIEWSTAMPED REPLICATION FOR HIGHLY AVAILABLE DISTRIBUTED SYSTEMS , 1988 .

[14]  Anita Borg,et al.  A message system supporting fault tolerance , 1983, SOSP '83.

[15]  D SchlichtingRichard,et al.  Fail-stop processors , 1983 .

[16]  Jehan-François Pâris,et al.  Voting with Witnesses: A Constistency Scheme for Replicated Files , 1986, ICDCS.

[17]  Patricia Florissi,et al.  On remote procedure call , 1992, CASCON.

[18]  Bruce Jay Nelson Remote procedure call , 1981 .

[19]  Irving L. Traiger,et al.  Granularity of Locks and Degrees of Consistency in a Shared Data Base , 1998, IFIP Working Conference on Modelling in Data Base Management Systems.

[20]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[21]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[22]  Barbara Liskov,et al.  Distributed programming in Argus , 1988, CACM.

[23]  Alfred Z. Spector,et al.  Distributed logging for transaction processing , 1987, SIGMOD '87.

[24]  Flaviu Cristian,et al.  An efficient, fault-tolerant protocol for replicated data management , 1985, PODS '85.