From Byzantine Failures to Crash Failures in Message-Passing Systems: a BG Simulation-based approach

The BG-simulation is a powerful reduction algorithm designed for asynchronous read/write crash-prone systems. It allows a set of $(t+1)$ asynchronous sequential processes to wait-free simulate (i.e., despite the crash of up to $t$ of them) an arbitrary number $n$ of processes under the assumption that at most $t$ of them may crash. The BG simulation shows that, in read/write systems, the crucial parameter is not the number $n$ of processes, but the upper bound $t$ on the number of process crashes. The paper extends the concept of BG simulation to asynchronous message-passing systems prone to Byzantine failures. Byzantine failures are the most general type of failure: a faulty process can exhibit any arbitrary behavior. Because of this, they are also the most difficult to analyze and to handle algorithmically. The main contribution of the paper is a signature-free reduction of Byzantine failures to crash failures. Assuming $t<\min(n',n/3)$, the paper presents an algorithm that simulates a system of $n'$ processes where up to $t$ may crash, on top of a basic system of $n$ processes where up to $t$ may be Byzantine. While topological techniques have been used to relate the computability of Byzantine failure-prone systems to that of crash failure-prone ones, this simulation is the first, to our knowledge, that establishes this relation directly, in an algorithmic way. In addition to extending the basic BG simulation to message-passing systems and failures more severe than process crashes, being modular and direct, this simulation provides us with a deeper insight in the nature and understanding of crash and Byzantine failures in the context of asynchronous message-passing systems. Moreover, it also allows crash-tolerant algorithms, designed for asynchronous read/write systems, to be executed on top of asynchronous message-passing systems prone to Byzantine failures.

[1]  Hagit Attiya,et al.  Renaming in an asynchronous environment , 1990, JACM.

[2]  Rida A. Bazzi,et al.  Optimally Simulating Crash Failures in a Byzantine Environment , 1991, WDAG.

[3]  Robbert van Renesse,et al.  Making Distributed Applications Robust , 2007, OPODIS.

[4]  Michael E. Saks,et al.  Wait-free k-set agreement is impossible: the topology of public knowledge , 1993, STOC.

[5]  Maurice Herlihy,et al.  Distributed computability in Byzantine asynchronous systems , 2013, STOC.

[6]  Hagit Attiya,et al.  Distributed Computing: Fundamentals, Simulations and Advanced Topics , 1998 .

[7]  Eli Gafni,et al.  Generalized FLP impossibility result for t-resilient asynchronous computations , 1993, STOC.

[8]  Michel Raynal,et al.  The universe of symmetry breaking tasks , 2011, PODC '11.

[9]  Brian A. Coan,et al.  A Compiler that Increases the Fault Tolerance of Asynchronous Protocols , 1988, IEEE Trans. Computers.

[10]  Hagit Attiya,et al.  Distributed computing - fundamentals, simulations, and advanced topics (2. ed.) , 2004, Wiley series on parallel and distributed computing.

[11]  Soma Chaudhuri,et al.  More Choices Allow More Faults: Set Consensus Problems in Totally Asynchronous Systems , 1993, Inf. Comput..

[12]  Nancy A. Lynch,et al.  The BG distributed simulation algorithm , 2001, Distributed Computing.

[13]  Maurice Herlihy,et al.  The decidability of distributed decision tasks (extended abstract) , 1997, STOC '97.

[14]  Michel Raynal,et al.  Power and limits of distributed computing shared memory models , 2013, Theor. Comput. Sci..

[15]  Maurice Herlihy,et al.  Distributed Computing Through Combinatorial Topology , 2013 .

[16]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[17]  Michel Raynal,et al.  The multiplicative power of consensus numbers , 2010, PODC '10.

[18]  Michel Raynal,et al.  Visiting Gafni's Reduction Land: From the BG Simulation to the Extended BG Simulation , 2009, SSS.

[19]  Gil Neiger,et al.  Automatically Increasing the Fault-Tolerance of Distributed Algorithms , 1990, J. Algorithms.

[20]  Michel Raynal,et al.  The renaming problem in shared memory systems: An introduction , 2011, Comput. Sci. Rev..

[21]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[22]  Eli Gafni The extended BG-simulation and the characterization of t-resiliency , 2009, STOC '09.

[23]  Sam Toueg,et al.  Simulating authenticated broadcasts to derive simple fault-tolerant algorithms , 1987, Distributed Computing.

[24]  Maurice Herlihy,et al.  The topological structure of asynchronous computability , 1999, JACM.

[25]  Nir Shavit,et al.  Atomic snapshots of shared memory , 1990, JACM.

[26]  Gabriel Bracha,et al.  Asynchronous Byzantine Agreement Protocols , 1987, Inf. Comput..

[27]  Sergio Rajsbaum,et al.  New combinatorial topology bounds for renaming: The upper bound , 2012, JACM.

[28]  Leslie Lamport,et al.  On Interprocess Communication-Part I: Basic Formalism, Part II: Algorithms , 2016 .

[29]  Michel Raynal,et al.  A liveness condition for concurrent objects: x‐wait‐freedom , 2011, Concurr. Comput. Pract. Exp..