Tolerating Slowdowns in Replicated State Machines using Copilots

Replicated state machines are linearizable, fault-tolerant groups of replicas that are coordinated using a consensus algorithm. Copilot replication is the first 1-slowdown-tolerant consensus protocol: it delivers normal latency despite the slowdown of any 1 replica. Copilot uses two distinguished replicas—the pilot and copilot—to proactively add redundancy to all stages of processing a client’s command. Copilot uses dependencies and deduplication to resolve potentially differing orderings proposed by the pilots. To avoid dependencies leading to either pilot being able to slow down the group, Copilot uses fast takeovers that allow a fast pilot to complete the ongoing work of a slow pilot. Copilot includes two optimizations—ping-pong batching and null dependency elimination—that improve its performance when there are 0 and 1 slow pilots respectively. Our evaluation of Copilot shows its performance is lower but competitive with MultiPaxos and EPaxos when no replicas are slow. When a replica is slow, Copilot is the only protocol that avoids high latencies.

[1]  Yafei Dai,et al.  SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines , 2018, SoCC.

[2]  Marcos K. Aguilera,et al.  No Time for Asynchrony , 2009, HotOS.

[3]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[4]  Marko Vukolic,et al.  The Next 700 BFT Protocols , 2015, ACM Trans. Comput. Syst..

[5]  Niall Murphy,et al.  Site Reliability Engineering: How Google Runs Production Systems , 2016 .

[6]  Michael J. Freedman,et al.  Prophecy: Using History for High-Throughput Fault Tolerance , 2010, NSDI.

[7]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OPSR.

[8]  Michael Isard,et al.  Autopilot: automatic data center management , 2007, OPSR.

[9]  Leslie Lamport,et al.  Generalized Consensus and Paxos , 2005 .

[10]  Lorenzo Alvisi,et al.  I Can't Believe It's Not Causal! Scalable Causal Consistency with No Slowdown Cascades , 2017, NSDI.

[11]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[12]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[13]  Michael Dahlin,et al.  Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults , 2009, NSDI.

[14]  Arun Venkataramani,et al.  ZZ and the art of practical BFT execution , 2011, EuroSys '11.

[15]  David Mazières Paxos Made Practical , 2007 .

[16]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[17]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.

[18]  Dahlia Malkhi,et al.  Flexible Paxos: Quorum Intersection Revisited , 2016, OPODIS.

[19]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[20]  Yair Amir,et al.  Paxos for System Builders: an overview , 2008, LADIS '08.

[21]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[22]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[23]  Leslie Lamport,et al.  Fast Paxos , 2006, Distributed Computing.

[24]  Peng Huang,et al.  13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018 , 2018, OSDI.

[25]  Peng Huang,et al.  Comprehensive and Efficient Runtime Checking in System Software through Watchdogs , 2019, HotOS.

[26]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[27]  Amer Diwan,et al.  Meaningful Availability , 2020, NSDI.

[28]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[29]  Keith Marzullo,et al.  Mencius: Building Efficient Replicated State Machine for WANs , 2008, OSDI.

[30]  David G. Andersen,et al.  There is more consensus in Egalitarian parliaments , 2013, SOSP.

[31]  Sangmin Lee,et al.  Upright cluster services , 2009, SOSP '09.

[32]  Peng Huang,et al.  Gray Failure: The Achilles' Heel of Cloud-Scale Systems , 2017, HotOS.

[33]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[34]  Tao Chen,et al.  Millions of Tiny Databases , 2020, NSDI.

[35]  Wyatt Lloyd,et al.  Gryff: Unifying Consensus and Shared Registers , 2020, NSDI.

[36]  Marcos K. Aguilera,et al.  Detecting failures in distributed systems with the Falcon spy network , 2011, SOSP.

[37]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[38]  Jialin Li,et al.  Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering , 2016, OSDI.

[39]  Barbara Liskov,et al.  Viewstamped Replication: A General Primary Copy , 1988, PODC.

[40]  Jialin Li,et al.  Designing Distributed Systems Using Approximate Synchrony in Data Center Networks , 2015, NSDI.

[41]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[42]  Sanjeev Kumar,et al.  Challenges to Adopting Stronger Consistency at Scale , 2015, HotOS.

[43]  Roberto Palmieri,et al.  Speeding up Consensus by Chasing Fast Decisions , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[44]  Barbara Liskov,et al.  Viewstamped Replication Revisited , 2012 .

[45]  Vanish Talwar,et al.  Turbine: Facebook’s Service Management Platform for Stream Processing , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[46]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .