EBAWA: Efficient Byzantine Agreement for Wide-Area Networks

The popularity of wide-area computer services has generated a compelling need for efficient algorithms that provide high reliability. Byzantine fault-tolerant (BFT) algorithms can be used with this purpose because they allow replicated systems to continue to provide a correct service even when some of their replicas fail arbitrarily, either accidentally or due to malicious faults. Current BFT algorithms perform well on LANs but when the replicas are distributed geographically their performance is affected by the lower bandwidth and the higher and more heterogeneous network latencies. This paper proposes and evaluates a novel BFT algorithm for WANs that requires fewer communication steps, fewer replicas and has better throughput and latency than others in the literature. The paper presents an extensive evaluation of the algorithm’s performance in several settings and conditions: in a LAN, in real and emulated WANs, with clients close to servers and dispersed geographically, with similar and different communication latencies between clients and servers.

[1]  Ramakrishna Kotla,et al.  Zyzzyva: speculative byzantine fault tolerance , 2007, TOCS.

[2]  Liuba Shrira,et al.  HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.

[3]  Scott Shenker,et al.  Attested append-only memory: making adversaries stick to their word , 2007, SOSP.

[4]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OSDI '02.

[5]  Keith Marzullo,et al.  Mencius: Building Efficient Replicated State Machine for WANs , 2008, OSDI.

[6]  Leslie Lamport Lower bounds for asynchronous consensus , 2003 .

[7]  Paulo Veríssimo,et al.  Travelling through wormholes: a new look at distributed systems models , 2006, SIGA.

[8]  Miguel Correia,et al.  Spin One's Wheels? Byzantine Fault Tolerance with a Spinning Primary , 2009, 2009 28th IEEE International Symposium on Reliable Distributed Systems.

[9]  K. Marzullo,et al.  Towards Low Latency State Machine Replication for Uncivil Wide-area Networks , 2009 .

[10]  Michael Dahlin,et al.  Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults , 2009, NSDI.

[11]  John Lane,et al.  Byzantine replication under attack , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[12]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[13]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[14]  Jacob R. Lorch,et al.  TrInc: Small Trusted Hardware for Large Distributed Systems , 2009, NSDI.

[15]  John Lane,et al.  Steward: Scaling Byzantine Fault-Tolerant Replication to Wide Area Networks , 2010, IEEE Transactions on Dependable and Secure Computing.

[16]  Miguel Correia,et al.  How to tolerate half less one Byzantine nodes in practical distributed systems , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[17]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[18]  Bev Littlewood,et al.  Redundancy and Diversity in Security , 2004, ESORICS.

[19]  Miguel Correia,et al.  Minimal Byzantine Fault Tolerance: Algorithm and Evaluation , 2009 .