Reliability issues for fully replicated distributed databases

Reliability is commonly considered to be one of the main advantages of distributed databases, but there are problems that must be overcome before it can be achieved. Exactly what is reliability? There are many types of failures, and we should know which types the system will be protected against. It is also necessary to define how the system will react to those failures. A considerable number of special protocols must be implemented with great care in order to realize the desired level of reliability. The purpose of this article is to discuss the design alternatives and problems that must be solved in order to make a completely replicated distributed database reliable. A distributed database is completely replicated if a copy of the value of any data item is stored at every node (or site) in the system. A completely replicated distributed database is a special case of a general distributed database, and there are several reasons for studying the reliability issues in this limited context: (1) Complete replication simplifies the problems. Making a distributed database reliable is not a simple matter, so as a first step we can try to understand a simplified specific case. (2) Replicated data is the key to making data available after failures, so we will actually be concentrating on the critical component of distributed databases. (3) Transaction processing (with no failures) in a completely replicated distributed database is well understood and a good number of papers have been written on the subject. 1-5 Most of the research for reliable distributed databases has been performed in the context of particular transaction processing algorithms. (There are exceptions, however .6'7) This means that only the alternatives best suited to the transaction processing algorithm are considered. In this article this will be avoided by not advocating a particular transaction processing algorithm. A person designing a reliable distributed database is faced with a set of choices. Here these choices are divided into seven broad categories and each category will be discussed in one of the following sections. For each choice that must be made by the designer, I will attempt to give the major alternatives available and then list the implications of these alternatives. Many of the mechanisms for reliable distributed databases that we will discuss have been presented in the literature. I would like to caution the reader that my division of choices is somewhat arbitrary and that the …

[1]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[2]  Susan B. Davidson,et al.  An optimistic protocol for partitioned distributed database systems , 1982 .

[3]  Dale Skeen,et al.  A Quorum-Based Commit Protocol , 1982, Berkeley Workshop.

[4]  Hector Garcia-Molina Performance of update algorithms for replicated data in a distributed database , 1979 .

[5]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[6]  Michael Stonebraker,et al.  Concurrency Control and Consistency of Multiple Copies of Data in Distributed Ingres , 1979, IEEE Transactions on Software Engineering.

[7]  H ThomasRobert A Majority consensus approach to concurrency control for multiple copy databases , 1979 .

[8]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[9]  Irving L. Traiger,et al.  The notions of consistency and predicate locks in a database system , 1976, CACM.

[10]  Gordon Bell,et al.  Ethernet: Distributed Packet Switching for Local Computer Networks , 1976 .

[11]  Daniel A. Menascé,et al.  A locking protocol for resource coordination in distributed databases , 1978, SIGMOD Conference.

[12]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[13]  Christos H. Papadimitriou,et al.  The Concurrency Control Mechanism of SDD-1: A System for Distributed Databases (The Fully Redundant Case) , 1978, IEEE Transactions on Software Engineering.

[14]  Nathan Goodman,et al.  A Survey of Research and Development in Distributed Database Management , 1977, VLDB.

[15]  Clarence A. Ellis,et al.  Consistency and correctness of duplicate database systems , 1977, SOSP '77.

[16]  Fred B. Schneider,et al.  Synchronization in Distributed Programs , 1982, TOPL.