Consensus-Oriented Parallelization: How to Earn Your First Million

Consensus protocols employed in Byzantine fault-tolerant systems are notoriously compute intensive. Unfortunately, the traditional approach to execute instances of such protocols in a pipelined fashion is not well suited for modern multi-core processors and fundamentally restricts the overall performance of systems based on them. To solve this problem, we present the consensus-oriented parallelization (COP) scheme, which disentangles consecutive consensus instances and executes them in parallel by independent pipelines; or to put it in the terminology of our main target, today's processors: COP is the introduction of superscalarity to the field of consensus protocols. In doing so, COP achieves 2.4 million operations per second on commodity server hardware, a factor of 6 compared to a contemporary pipelined approach measured on the same code base and a factor of over 20 compared to the highest throughput numbers published for such systems so far. More important, however, is: COP provides up to 3 times as much throughput on a single core than its competitors and it can make use of additional cores where other approaches are confined by the slowest stage in their pipeline. This enables Byzantine fault tolerance for the emerging market of extremely demanding transactional systems and gives more room for conventional deployments to increase their quality of service.

[1]  Fernando Pedone,et al.  Building global and scalable systems with atomic multicast , 2014, Middleware.

[2]  Vivien Quéma,et al.  RBFT: Redundant Byzantine Fault Tolerance , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[3]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[4]  Miguel Correia,et al.  Spin One's Wheels? Byzantine Fault Tolerance with a Spinning Primary , 2009, 2009 28th IEEE International Symposium on Reliable Distributed Systems.

[5]  Alysson Neves Bessani,et al.  State Machine Replication for the Masses with BFT-SMART , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[6]  Ramakrishna Kotla,et al.  High throughput Byzantine fault tolerance , 2004, International Conference on Dependable Systems and Networks, 2004.

[7]  Yang Wang,et al.  All about Eve: Execute-Verify Replication for Multi-Core Servers , 2012, OSDI.

[8]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[9]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition , 2013, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.

[10]  Douglas C. Schmidt,et al.  Transport System Architecture Services for High-Performance Communications Systems , 1993, IEEE J. Sel. Areas Commun..

[11]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[12]  Arun Venkataramani,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003, SOSP '03.

[13]  Flavio Paiva Junqueira,et al.  Scalable Agreement: Toward Ordering as a Service , 2010, HotDep.

[14]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[15]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[16]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[17]  Tudor David,et al.  Everything you always wanted to know about synchronization but were afraid to ask , 2013, SOSP.

[18]  Tobias Distler,et al.  Increasing performance in byzantine fault-tolerant systems with on-demand replica consistency , 2011, EuroSys '11.

[19]  Miguel Correia,et al.  Efficient Byzantine Fault-Tolerance , 2013, IEEE Transactions on Computers.

[20]  André Schiper,et al.  Achieving High-Throughput State Machine Replication in Multi-core Systems , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[21]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[22]  Dong Zhou,et al.  Rex: replication at the speed of multi-core , 2014, EuroSys '14.

[23]  John Lane,et al.  Prime: Byzantine Replication under Attack , 2011, IEEE Transactions on Dependable and Secure Computing.

[24]  Johannes Behl,et al.  CheapBFT: resource-efficient byzantine fault tolerance , 2012, EuroSys '12.

[25]  Fernando Pedone,et al.  Rethinking State-Machine Replication for Parallelism , 2013, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[26]  Michael K. Reiter,et al.  Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.

[27]  Sangmin Lee,et al.  Upright cluster services , 2009, SOSP '09.

[28]  Marko Vukolic,et al.  The Next 700 BFT Protocols , 2015, ACM Trans. Comput. Syst..