Elections in a Distributed Computing System

After a failure occurs in a distributed computing system, it is often necessary to reorganize the active nodes so that they can continue to perform a useful task. The first step in such a reorganization or reconfiguration is to elect a coordinator node to manage the operation. This paper discusses such elections and reorganizations. Two types of reasonable failure environments are studied. For each environment assertions which define the meaning of an election are presented. An election algorithm which satisfies the assertions is presented for each environment.

[1]  Edsger W. Dijkstra,et al.  Solution of a problem in concurrent programming control , 1965, CACM.

[2]  Elwyn R. Berlekamp,et al.  Algebraic coding theory , 1984, McGraw-Hill series in systems science.

[3]  Leslie Lamport,et al.  A new solution of Dijkstra's concurrent programming problem , 1974, Commun. ACM.

[4]  Robert Metcalfe,et al.  Ethernet: distributed packet switching for local computer networks , 1988, CACM.

[5]  Butler W. Lampson,et al.  Crash Recovery in a Distributed Data Storage System , 1981 .

[6]  Clarence A. Ellis,et al.  Consistency and correctness of duplicate database systems , 1977, SOSP '77.

[7]  Nathan Goodman,et al.  A Survey of Research and Development in Distributed Database Management , 1977, VLDB.

[8]  Gérard Le Lann,et al.  Distributed Systems - Towards a Formal Approach , 1977, IFIP Congress.

[9]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[10]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[11]  Philip H. Enslow What is a "Distributed" Data Processing System? , 1978, Computer.

[12]  Leslie Lamport,et al.  The Implementation of Reliable Distributed Multiprocess Systems , 1978, Comput. Networks.

[13]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[14]  Hector Garcia-Molina Performance of update algorithms for replicated data in a distributed database , 1979 .

[15]  Michael Stonebraker,et al.  Concurrency Control and Consistency of Multiple Copies of Data in Distributed Ingres , 1979, IEEE Transactions on Software Engineering.

[16]  G. Popek,et al.  A locking protocol for resource coordination in distributed databases , 1980, TODS.

[17]  Michael O. Rabin N-Process Synchronization by 4 log _2 N-Valued Shared Variables , 1980, FOCS.

[18]  Reid G. Smith,et al.  The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver , 1980, IEEE Transactions on Computers.

[19]  Alley Stoughton,et al.  Detection of Mutual Inconsitency in Distributed Systems , 1981, Berkeley Workshop.

[20]  Fred B. Schneider,et al.  Synchronization in Distributed Programs , 1982, TOPL.

[21]  Alley Stoughton,et al.  Detection of Mutual Inconsistency in Distributed Systems , 1983, IEEE Transactions on Software Engineering.

[22]  G. S. Graham A New Solution of Dijkstra ' s Concurrent Programming Problem , 2022 .