Exploiting Commutativity For Practical Fast Replication

Traditional approaches to replication require client requests to be ordered before making them durable by copying them to replicas. As a result, clients must wait for two round-trip times (RTTs) before updates complete. In this paper, we show that this entanglement of ordering and durability is unnecessary for strong consistency. Consistent Unordered Replication Protocol (CURP) allows clients to replicate requests that have not yet been ordered, as long as they are commutative. This strategy allows most operations to complete in 1 RTT (the same as an unreplicated system). We implemented CURP in the Redis and RAMCloud storage systems. In RAMCloud, CURP improved write latency by ~2x (13.8 us -> 7.3 us) and write throughput by 4x. Compared to unreplicated RAMCloud, CURP's latency overhead for 3-way replication is just 0.4 us (6.9 us vs 7.3 us). CURP transformed a non-durable Redis cache into a consistent and durable storage system with only a small performance overhead.

[1]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[2]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[3]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[4]  Jialin Li,et al.  Designing Distributed Systems Using Approximate Synchrony in Data Center Networks , 2015, NSDI.

[5]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[6]  Michael Burrows,et al.  The Chubby Lock Service for Loosely-Coupled Distributed Systems , 2006, OSDI.

[7]  Eric Eide,et al.  Introducing CloudLab: Scientific Infrastructure for Advancing Cloud Architectures and Applications , 2014, login Usenix Mag..

[8]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[9]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[10]  Swaminathan Sivasubramanian,et al.  Amazon dynamoDB: a seamlessly scalable non-relational database service , 2012, SIGMOD Conference.

[11]  Amar Phanishayee,et al.  PLATO: Predictive Latency-Aware Total Ordering , 2006, 2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06).

[12]  Ashish Gupta,et al.  The RAMCloud Storage System , 2015, ACM Trans. Comput. Syst..

[13]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[14]  Andrew S. Tanenbaum,et al.  Group communication in the Amoeba distributed operating system , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[15]  David G. Andersen,et al.  There is more consensus in Egalitarian parliaments , 2013, SOSP.

[16]  Jinyang Li,et al.  Consolidating Concurrency Control and Consensus for Commits under Conflicts , 2016, OSDI.

[17]  Marcos K. Aguilera,et al.  Consistency-based service level agreements for cloud storage , 2013, SOSP.

[18]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[19]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[20]  Jialin Li,et al.  Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering , 2016, OSDI.

[21]  Satoshi Matsushita,et al.  Implementing linearizability at large scale and low latency , 2015, SOSP.

[22]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[23]  Leslie Lamport,et al.  Generalized Consensus and Paxos , 2005 .

[24]  GhemawatSanjay,et al.  The Google file system , 2003 .

[25]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[26]  André Schiper,et al.  Optimistic active replication , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[27]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[28]  Arvind Krishnamurthy,et al.  Building consistent transactions with inconsistent replication , 2015, SOSP.

[29]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[30]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[31]  Leslie Lamport,et al.  Fast Paxos , 2006, Distributed Computing.

[32]  Wenbing Zhao Fast Paxos Made Easy: Theory and Implementation , 2015, Int. J. Distributed Syst. Technol..

[33]  Kenneth Baclawski,et al.  Quickly generating billion-record synthetic databases , 1994, SIGMOD '94.

[34]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.