The Impact of RDMA on Agreement

Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires n \geq 2f_P + 1 processes (where f_P is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires n \geq f_P + 1 processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.

[1]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[2]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[3]  Michael Ben-Or,et al.  Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols , 1983, PODC '83.

[4]  Leslie Lamport,et al.  The Weak Byzantine Generals Problem , 1983, JACM.

[5]  Sam Toueg,et al.  Asynchronous consensus and broadcast protocols , 1985, JACM.

[6]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[7]  Gabriel Bracha,et al.  Asynchronous Byzantine Agreement Protocols , 1987, Inf. Comput..

[8]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[9]  Maurice Herlihy,et al.  Fast Randomized Consensus Using Shared Memory , 1990, J. Algorithms.

[10]  Gil Neiger,et al.  Automatically Increasing the Fault-Tolerance of Distributed Algorithms , 1990, J. Algorithms.

[11]  Rida A. Bazzi,et al.  Optimally Simulating Crash Failures in a Byzantine Environment , 1991, WDAG.

[12]  David S. Greenberg,et al.  Computing with faulty shared memory , 1992, PODC '92.

[13]  D. Dolev,et al.  Sharing memory robustly in message-passing systems , 1995, JACM.

[14]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[15]  Sam Toueg,et al.  Fault-tolerant wait-free shared objects , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[16]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[17]  Achour Mostéfaoui,et al.  Consensus in One Communication Step , 2001, PaCT.

[18]  Idit Keidar,et al.  On the cost of fault-tolerant consensus when there are no faults: preliminary version , 2001, SIGA.

[19]  Klaus Kursawe,et al.  Optimistic Byzantine agreement , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[20]  Michael K. Reiter,et al.  Objects shared by Byzantine processes , 2000, Distributed Computing.

[21]  Rachid Guerraoui,et al.  Deconstructing paxos , 2003, SIGA.

[22]  Leslie Lamport,et al.  Disk Paxos , 2003, Distributed Computing.

[23]  Idit Keidar,et al.  Byzantine disk paxos: optimal resilience with byzantine shared memory , 2004, PODC '04.

[24]  Miguel Correia,et al.  How to tolerate half less one Byzantine nodes in practical distributed systems , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[25]  Noga Alon,et al.  Tight bounds for shared memory systems accessed by Byzantine processes , 2002, Distributed Computing.

[26]  Rachid Guerraoui,et al.  How fast can eventual synchrony lead to consensus? , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[27]  Jean-Philippe Martin,et al.  Fast Byzantine Consensus , 2006, IEEE Trans. Dependable Secur. Comput..

[28]  Dan Dobre,et al.  One-step Consensus with Zero-Degradation , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[29]  Leslie Lamport,et al.  Fast Paxos , 2006, Distributed Computing.

[30]  Scott Shenker,et al.  Attested append-only memory: making adversaries stick to their word , 2007, SOSP.

[31]  Scott Shenker,et al.  Diverse Replication for Single-Machine Byzantine-Fault Tolerance , 2008, USENIX Annual Technical Conference.

[32]  Robbert van Renesse,et al.  Bosco: One-Step Byzantine Asynchronous Consensus , 2008, DISC.

[33]  Miguel Correia,et al.  Sharing Memory between Byzantine Processes Using Policy-Enforced Tuple Spaces , 2009, IEEE Trans. Parallel Distributed Syst..

[34]  Miguel Correia,et al.  Asynchronous Byzantine Consensus with 2f+1 Processes (extended version) , 2009 .

[35]  Miguel Correia,et al.  Asynchronous Byzantine consensus with 2f+1 processes , 2010, SAC '10.

[36]  Johannes Behl,et al.  CheapBFT: resource-efficient byzantine fault tolerance , 2012, EuroSys '12.

[37]  Aniket Kate,et al.  On the (limited) power of non-equivocation , 2012, PODC '12.

[38]  Miguel Correia,et al.  Efficient Byzantine Fault-Tolerance , 2013, IEEE Transactions on Computers.

[39]  Michael Kaminsky,et al.  Using RDMA efficiently for key-value services , 2014, SIGCOMM.

[40]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[41]  Torsten Hoefler,et al.  DARE: High-Performance State Machine Replication on RDMA Networks , 2015, HPDC.

[42]  Marko Vukolic,et al.  The Next 700 BFT Protocols , 2015, ACM Trans. Comput. Syst..

[43]  R. V. Renesse,et al.  Derecho : Group Communication at the Speed of Light , 2016 .

[44]  Michel Raynal,et al.  A necessary condition for Byzantine k-set agreement , 2016, Inf. Process. Lett..

[45]  David G. Andersen,et al.  FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs , 2016, OSDI.

[46]  David G. Andersen,et al.  Design Guidelines for High Performance RDMA Systems , 2016, USENIX ATC.

[47]  Cheng Wang,et al.  APUS: fast and scalable paxos on RDMA , 2017, SoCC.

[48]  Marcos K. Aguilera,et al.  Passing Messages while Sharing Memory , 2018, PODC.

[49]  Rüdiger Kapitza,et al.  Towards Low-Latency Byzantine Agreement Protocols Using RDMA , 2018 .

[50]  Virendra J. Marathe,et al.  The Impact of RDMA on Agreement [ Extended Version ] , 2019 .