Reaching Approximate Agreement with Mixed-Mode Faults

In a fault-tolerant distributed system, different non-faulty processes may arrive at different values for a given system parameter. To resolve this disagreement, processes must exchange and vote upon their respective local values. Faulty processes may attempt to inhibit agreement by acting in a malicious or "Byzantine" manner. Approximate agreement defines one form of agreement in which the voted values obtained by the non-faulty processes need not be identical. Instead, they need only agree to within a predefined tolerance. Approximate agreement can be achieved by a sequence of convergent voting rounds, in which the range of values held by non-faulty processes is reduced in each round. Historically, each new convergent voting algorithm has been accompanied by ad-hoc proofs of its convergence rate and fault-tolerance, using an overly conservative fault model in which all faults exhibit worst-case Byzantine behavior. This paper presents a general method to quickly determine convergence rate and fault-tolerance for any member of a broad family of convergent voting algorithms. This method is developed under a realistic mixed-mode fault model comprised of asymmetric, symmetric, and benign fault modes. These results are employed to more accurately analyze the properties of several existing voting algorithms, to derive a sub-family of optimal mixed-mode voting algorithms, and to quickly determine the properties of proposed new voting algorithms. >

[1]  K. W. Anderson,et al.  Sets, Sequences, and Mappings: The Basic Concepts of Analysis , 1964 .

[2]  Joel R. Sklaroff,et al.  Redundancy Management Technique for Space Shuttle Computers , 1976, IBM J. Res. Dev..

[3]  C. L Liu,et al.  Elements of discrete mathematics (McGraw-Hill computer science series) , 1977 .

[4]  Gernot Metze,et al.  Fault Detection Capabilities of Alternating Logic , 1978, IEEE Transactions on Computers.

[5]  John F. Wakerly,et al.  Synchronization and Matching in Redundant Systems , 1978, IEEE Transactions on Computers.

[6]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[7]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[8]  Nancy A. Lynch,et al.  A new fault-tolerant algorithm for clock synchronization , 1984, PODC '84.

[9]  Joep L. W. Kessels Two Designs of a Fault-Tolerant Clocking System , 1984, IEEE Transactions on Computers.

[10]  Omri Serlin Fault-Tolerant Systems in Commercial Applications , 1984, Computer.

[11]  Özalp Babaoglu,et al.  Streets of Byzantium: Network Architectures for Fast Reliable Broadcasts , 1985, IEEE Transactions on Software Engineering.

[12]  P. M. Melliar-Smith,et al.  Synchronizing clocks in the presence of faults , 1985, JACM.

[13]  Nancy A. Lynch,et al.  Reaching approximate agreement in the presence of faults , 1986, JACM.

[14]  Fred B. Schneider,et al.  Understanding Protocols for Byzantine Clock Synchronization , 1987 .

[15]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[16]  Philip M. Thambidurai,et al.  Interactive consistency with multiple failure modes , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[17]  Chris J. Walter,et al.  The MAFT Architecture for Distributed Fault Tolerance , 1988, IEEE Trans. Computers.

[18]  Peter N. Marinos,et al.  Synchronization of Fault-Tolerant Clocks in the Presence of Malicious Failures , 1988, IEEE Trans. Computers.

[19]  Nancy A. Lynch,et al.  A New Fault-Tolerance Algorithm for Clock Synchronization , 1988, Inf. Comput..

[20]  Chris J. Walter,et al.  Clock synchronization in MAFT , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[21]  Peter N. Marinos,et al.  Design of fault-tolerant clocks with realistic failure assumptions , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[22]  Dhiraj K. Pradhan,et al.  Consensus With Dual Failure Modes , 1991, IEEE Trans. Parallel Distributed Syst..

[23]  R. Kieckhafer,et al.  Low Cost Approximate Agreement In Partially Connected Networks , 1993 .

[24]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..