Unreliable intrusion detection in distributed computations

Distributed coordination is difficult, especially when the system may suffer intrusions that corrupt some component processes. We introduce the abstraction of a failure detector that a process can use to (imperfectly) detect the corruption (Byzantine failure) of another process. In general, our failure detectors can be unreliable, both by reporting a correct process to be faulty or by reporting a faulty process to be correct. However, we show that if these detectors satisfy certain plausible properties, then the well known distributed consensus problem can be solved. We also present a randomized protocol using failure detectors that solves the consensus problem if either the requisite properties of failure detectors hold or if certain highly probable events eventually occur. This work can be viewed as a generalization of benign failure detectors popular in the distributed computing literature.

[1]  Cynthia Dwork,et al.  Randomization in Byzantine Agreement , 1989, Adv. Comput. Res..

[2]  Michael K. Reiter,et al.  A high-throughput secure reliable multicast protocol , 1996, Proceedings 9th IEEE Computer Security Foundations Workshop.

[3]  Michael Ben-Or,et al.  Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols , 1983, PODC '83.

[4]  Sam Toueg,et al.  Randomized Byzantine Agreements , 1984, PODC '84.

[5]  Michael K. Reiter,et al.  Secure agreement protocols: reliable and atomic group multicast in rampart , 1994, CCS '94.

[6]  Michael O. Rabin,et al.  Randomized byzantine generals , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[7]  Marcos K. Aguilera,et al.  Randomization and Failure Detection: A Hybrid Approach to Solve Consensus , 1996, WDAG.

[8]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[9]  R. Guerraoui \??accurate" Failure Detectors , 1996 .

[10]  Arkady Zamsky An randomized Byzantine agreement protocol with constant expected time and guaranteed termination in optimal (deterministic) time , 1996, PODC '96.

[11]  Roy Friedman,et al.  Failure detectors in omission failure environments , 1997, PODC '97.

[12]  Oded Goldreich,et al.  The Best of Both Worlds: Guaranteeing Termination in Fast Randomized Byzantine Agreement Protocols , 1990, Inf. Process. Lett..

[13]  Michael K. Reiter,et al.  Securing Causal Relationships in Distributed Systems , 1995, Comput. J..

[14]  Rachid Guerraoui,et al.  "Gamma-Accurate" Failure Detectors , 1996, WDAG.

[15]  Michael Ben-Or,et al.  Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols , 1983, PODC '83.

[16]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[17]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[18]  Ran Canetti,et al.  Fast asynchronous Byzantine agreement with optimal resilience , 1993, STOC.

[19]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[20]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[21]  Sam Toueg,et al.  Asynchronous consensus and broadcast protocols , 1985, JACM.