Lumière: Byzantine View Synchronization

Many distributed protocols in the partial synchrony setting with Byzantine nodes divide the local state of the nodes into views, and the transition from one view to the next dictates a leader change. In order to provide liveness, all honest nodes need to stay in the same view for a sufficiently long time. This requires intricate mechanisms that are typically intertwined with the rest of the protocol, making it hard to understand and to reason about. Furthermore, state-machine replication, which is consisted of multiple instances of single-shot consensus, can use the same view synchronization protocol. We define the Byzantine View Synchronization problem, which is responsible for eventually bringing all nodes to the same view for a sufficiently long time. Two approaches for implementing a protocol that achieves view synchronization exhibit the following tradeoffs: a view doubling solution has zero communication costs but unbounded latency, while a broadcastbased solution has quadratic communication costs but constant latency. We describe both protocols, prove their correctness, and also introduce a third protocol, named Lumi\`ere, that has optimistically linear communication complexity and constant latency, and faced with benign failures, has expected linear communication and constant latency. Lumi\`ere is particularly useful for a family of consensus protocols that exhibit linear communication under various circumstances.

[1]  Hermann Kopetz,et al.  Clock Synchronization in Distributed Real-Time Systems , 1987, IEEE Transactions on Computers.

[2]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[3]  Ethan Buchman,et al.  Tendermint: Byzantine Fault Tolerance in the Age of Blockchains , 2016 .

[4]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.

[5]  Gabriel Bracha,et al.  Asynchronous Byzantine Agreement Protocols , 1987, Inf. Comput..

[6]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[7]  Victor Shoup,et al.  Practical Threshold Signatures , 2000, EUROCRYPT.

[8]  Colin J. Fidge,et al.  Timestamps in Message-Passing Systems That Preserve the Partial Ordering , 1988 .

[9]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[10]  Hovav Shacham,et al.  Short Signatures from the Weil Pairing , 2001, J. Cryptol..

[11]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[12]  Ittai Abraham,et al.  Asymptotically Optimal Validated Asynchronous Byzantine Agreement , 2019, PODC.

[13]  Victor Shoup,et al.  Random Oracles in Constantinople: Practical Asynchronous Byzantine Agreement Using Cryptography , 2000, Journal of Cryptology.

[14]  Sam Toueg,et al.  Optimal clock synchronization , 1985, PODC '85.

[15]  Flaviu Cristian,et al.  Probabilistic clock synchronization , 1989, Distributed Computing.

[16]  David L. Mills,et al.  Internet time synchronization: the network time protocol , 1991, IEEE Trans. Commun..

[17]  P. M. Melliar-Smith,et al.  Synchronizing clocks in the presence of faults , 1985, JACM.

[18]  Leslie Lamport,et al.  The Weak Byzantine Generals Problem , 1983, JACM.

[19]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[20]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[21]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX Annual Technical Conference.

[22]  Rachid Guerraoui,et al.  How fast can eventual synchrony lead to consensus? , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[23]  Nancy A. Lynch,et al.  Easy impossibility proofs for distributed consensus problems , 1985, PODC '85.

[24]  A. Sonnino,et al.  State Machine Replication in the Libra Blockchain , 2019 .

[25]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[26]  Bryan Ford,et al.  Threshold Logical Clocks for Asynchronous Distributed Coordination and Consensus , 2019, ArXiv.

[27]  Dan Alistarh,et al.  How to Solve Consensus in the Smallest Window of Synchrony , 2008, DISC.

[28]  Ittai Abraham,et al.  HotStuff: BFT Consensus with Linearity and Responsiveness , 2019, PODC.

[29]  Rachid Guerraoui,et al.  The overhead of consensus failure recovery , 2007, Distributed Computing.

[30]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[31]  Ethan Buchman,et al.  The latest gossip on BFT consensus , 2018, ArXiv.

[32]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[33]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .