Recovering in large distributed systems with replicated data

The problem of recovery in large-scale transaction-based distributed systems with replicated data is studied. In large distributed systems the cost of accessing data items may be considerably greater, because of the distances involved. It is thus important to exploit replication to reduce data-access times. Also, in large systems, failure events are much more frequent than in small systems. Therefore, executing costly recovery protocols, such as the ones needed to update stale, newly-recovered replicas or to resolve the uncertainty of recovering replicas, must be avoided. These protocols are called dependent recovery protocols, since they require a recovering site to consult other sites before it can be reintegrated into the distributed system. Independent recovery has been proved unattainable in one-copy systems. It is shown that independent recovery is possible in systems with replicated data by contributing such a protocol. Simulation and analytical studies of its performance and availability characteristics are reported.<<ETX>>

[1]  Divyakant Agrawal,et al.  The generalized tree quorum protocol: an efficient approach for managing replicated data , 1992, TODS.

[2]  Peter Triantafillou,et al.  A new paradigm for high availability and efficiency in replicated distributed databases , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.

[3]  Philip A. Bernstein,et al.  The failure and recovery problem for replicated databases , 1983, PODC '83.

[4]  Derek L. Eager,et al.  Achieving robustness in distributed database systems , 1983, TODS.

[5]  Amr El Abbadi,et al.  Maintaining availability in partitioned replicated databases , 1987, ACM Trans. Database Syst..

[6]  M. Herlihy A quorum-consensus replication method for abstract data types , 1986, TOCS.

[7]  Divyakant Agrawal,et al.  The Tree Quorum Protocol: An Efficient Approach for Managing Replicated Data , 1990, VLDB.

[8]  Peter Triantafillou,et al.  Using multiple replica classes to improve performance in distributed systems , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[9]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[10]  Amr El Abbadi,et al.  Availability in partitioned replicated databases , 1985, PODS.

[11]  Peter Triantafillou,et al.  Efficiently maintaining availability in the presence of partitionings in distributed systems , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[12]  Philip A. Bernstein,et al.  An algorithm for concurrency control and recovery in replicated distributed databases , 1984, TODS.

[13]  Darrell D. E. Long,et al.  A study of the reliability of Internet sites , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.

[14]  K. Brahmadathan,et al.  Read-only transactions in partitioned replicated databases , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.