Network Hardware-Accelerated Consensus

Consensus protocols are the foundation for building many fault-tolerant distributed systems and services. This paper posits that there are significant performance benefits to be gained by offering consensus as a network service (CAANS). CAANS leverages recent advances in commodity networking hardware design and programmability to implement consensus protocol logic in network devices. CAANS provides a complete Paxos protocol, is a drop-in replacement for software-based implementations of Paxos, makes no restrictions on network topologies, and is implemented in a higher-level, data-plane programming language, allowing for portability across a range of target devices. At the same time, CAANS significantly increases throughput and reduces latency for consensus operations. Consensus logic executing in hardware can transmit consensus messages at line speed, with latency only slightly higher than simply forwarding packets.

[1]  Alexander L. Wolf,et al.  NetAgg: Using Middleboxes for Application-specific On-path Aggregation in Data Centres , 2014, CoNEXT.

[2]  Leslie Lamport,et al.  Generalized Consensus and Paxos , 2005 .

[3]  Fernando Pedone,et al.  Rethinking State-Machine Replication for Parallelism , 2013, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[4]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[5]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[6]  Leslie Lamport,et al.  Fast Paxos , 2006, Distributed Computing.

[7]  André Schiper,et al.  Optimistic atomic broadcast: a pragmatic viewpoint , 2003, Theor. Comput. Sci..

[8]  Ramakrishna Kotla,et al.  High throughput Byzantine fault tolerance , 2004, International Conference on Dependable Systems and Networks, 2004.

[9]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[10]  Chen Liang,et al.  Participatory networking: an API for application control of SDNs , 2013, SIGCOMM.

[11]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[12]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[13]  Fernando Pedone,et al.  Building global and scalable systems with atomic multicast , 2014, Middleware.

[14]  Péter Urbán,et al.  Solving Agreement Problems with Weak Ordering Oracles , 2002, EDCC.

[15]  Fernando Pedone,et al.  Paxos Made Switch-y , 2015, CCRV.

[16]  Jerome H. Saltzer,et al.  End-to-end arguments in system design , 1984, TOCS.

[17]  Fernando Pedone,et al.  Ring Paxos: A high-throughput atomic broadcast protocol , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[18]  André Schiper,et al.  Replication: Theory and Practice , 2010, Replication.

[19]  Andrew W. Moore,et al.  NetFPGA SUME: Toward 100 Gbps as Research Commodity , 2014, IEEE Micro.

[20]  Benjamin Reed,et al.  A simple totally ordered broadcast protocol , 2008, LADIS '08.

[21]  Haoyu Song,et al.  Protocol-oblivious forwarding: unleash the power of SDN through a future-proof forwarding plane , 2013, HotSDN '13.

[22]  Jialin Li,et al.  Designing Distributed Systems Using Approximate Synchrony in Data Center Networks , 2015, NSDI.

[23]  Leslie Lamport,et al.  Lower bounds for asynchronous consensus , 2006, Distributed Computing.

[24]  Fernando Pedone,et al.  Geo-replicated storage with scalable deferred update replication , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[25]  André Schiper,et al.  Generic Broadcast , 1999, DISC.

[26]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[27]  George Varghese,et al.  Compiling Packet Programs to Reconfigurable Switches , 2015, NSDI.

[28]  David G. Andersen,et al.  There is more consensus in Egalitarian parliaments , 2013, SOSP.

[29]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[30]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[31]  Torsten Hoefler,et al.  DARE: High-Performance State Machine Replication on RDMA Networks , 2015, HPDC.

[32]  Albert G. Greenberg,et al.  EyeQ: Practical Network Performance Isolation at the Edge , 2013, NSDI.

[33]  Sakir Sezer,et al.  NFP-6xxx - a 22nm high-performance network flow processor for 200Gb/s Software Defined Networking , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).

[34]  Peng Li,et al.  Paxos Replicated State Machines as the Basis of a High-Performance Data Store , 2011, NSDI.

[35]  Leslie Lamport,et al.  Consensus on transaction commit , 2004, TODS.

[36]  Yang Wang,et al.  All about Eve: Execute-Verify Replication for Multi-Core Servers , 2012, OSDI.

[37]  Fernando Pedone,et al.  NetPaxos: consensus at network speed , 2015, SOSR.

[38]  Gustavo Alonso,et al.  Consensus in a Box: Inexpensive Coordination in Hardware , 2016, NSDI.

[39]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[40]  Fernando Pedone,et al.  Merlin: A Language for Provisioning Network Resources , 2014, CoNEXT.

[41]  Fernando Pedone,et al.  The Performance of Paxos in the Cloud , 2014, 2014 IEEE 33rd International Symposium on Reliable Distributed Systems.

[42]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[43]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[44]  Roy Friedman,et al.  Using Group Communication Technology to Implement a Reliable andScalable Distributed IN Coprocessor , 1996 .

[45]  Fernando Pedone,et al.  Scalable State-Machine Replication , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[46]  Junfeng Yang,et al.  Paxos made transparent , 2015, SOSP.

[47]  Gordon J. Brebner,et al.  High-Speed Packet Processing using Reconfigurable Computing , 2014, IEEE Micro.