Paxos made transparent

State machine replication (SMR) leverages distributed consensus protocols such as Paxos to keep multiple replicas of a program consistent in face of replica failures or network partitions. This fault tolerance is enticing on implementing a principled SMR system that replicates general programs, especially server programs that demand high availability. Unfortunately, SMR assumes deterministic execution, but most server programs are multithreaded and thus nondeterministic. Moreover, existing SMR systems provide narrow state machine interfaces to suit specific programs, and it can be quite strenuous and error-prone to orchestrate a general program into these interfaces This paper presents Crane, an SMR system that transparently replicates general server programs. Crane achieves distributed consensus on the socket API, a common interface to almost all server programs. It leverages deterministic multithreading (specifically, our prior system Parrot) to make multithreaded replicas deterministic. It uses a new technique we call time bubbling to efficiently tackle a difficult challenge of nondeterministic network input timing. Evaluation on five widely used server programs (e.g., Apache, ClamAV, and MySQL) shows that Crane is easy to use, has moderate overhead, and is robust. Crane's source code is at github.com/columbia/crane.

[1]  Sen Hu,et al.  Efficient system-enforced deterministic parallelism , 2010, OSDI.

[2]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[3]  Junfeng Yang,et al.  Parrot: a practical runtime for deterministic, stable, and reliable threads , 2013, SOSP.

[4]  Danfeng Zhang,et al.  Predictive mitigation of timing channels in interactive systems , 2011, CCS '11.

[5]  Yuanyuan Zhou,et al.  CTrigger: exposing atomicity violation bugs from their hiding places , 2009, ASPLOS.

[6]  David Mazières Paxos Made Practical , 2007 .

[7]  Dawson R. Engler,et al.  RacerX: effective, static detection of race conditions and deadlocks , 2003, SOSP '03.

[8]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[9]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[10]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[11]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[12]  Satish Narayanasamy,et al.  Respec: efficient online multiprocessor replayvia speculation and external determinism , 2010, ASPLOS XV.

[13]  Dong Zhou,et al.  Rex: replication at the speed of multi-core , 2014, EuroSys '14.

[14]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[15]  Scott A. Mahlke,et al.  Gadara: Dynamic Deadlock Avoidance for Multithreaded Programs , 2008, OSDI.

[16]  Dan Grossman,et al.  Input-covering schedules for multithreaded programs , 2013, OOPSLA.

[17]  Keith Marzullo,et al.  Mencius: Building Efficient Replicated State Machine for WANs , 2008, OSDI.

[18]  Ramakrishna Gummadi,et al.  Determinating timing channels in compute clouds , 2010, CCSW '10.

[19]  David G. Andersen,et al.  There is more consensus in Egalitarian parliaments , 2013, SOSP.

[20]  Shan Lu,et al.  ConMem: detecting severe concurrency bugs through an effect-oriented approach , 2010, ASPLOS XV.

[21]  Sangmin Lee,et al.  Upright cluster services , 2009, SOSP '09.

[22]  Leslie Lamport,et al.  Fast Paxos , 2006, Distributed Computing.

[23]  Danfeng Zhang,et al.  Predictive black-box mitigation of timing channels , 2010, CCS '10.

[24]  Brandon Lucia,et al.  DMP: Deterministic Shared-Memory Multiprocessing , 2010, IEEE Micro.

[25]  Xiao Ma,et al.  MUVI: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs , 2007, SOSP.

[26]  Luis Ceze,et al.  Deterministic Process Groups in dOS , 2010, OSDI.

[27]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[28]  Jeffrey Overbey,et al.  A type and effect system for deterministic parallel Java , 2009, OOPSLA '09.

[29]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[30]  Junfeng Yang,et al.  Pervasive detection of process races in deployed systems , 2011, SOSP.

[31]  Emery D. Berger,et al.  Dthreads: efficient deterministic multithreading , 2011, SOSP.

[32]  Fred B. Schneider,et al.  Hypervisor-based fault tolerance , 1996, TOCS.

[33]  Yuanyuan Zhou,et al.  AVIO: Detecting Atomicity Violations via Access-Interleaving Invariants , 2007, IEEE Micro.

[34]  Jason Nieh,et al.  Transparent, lightweight application execution replay on commodity multiprocessor operating systems , 2010, SIGMETRICS '10.

[35]  E. Berger,et al.  Grace: Safe and Efficient Concurrent Programming , 2008 .

[36]  Wei Zhang,et al.  Automated Concurrency-Bug Fixing , 2012, OSDI.

[37]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[38]  Jun Rao,et al.  Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore , 2011, Proc. VLDB Endow..

[39]  Horatiu Jula,et al.  Deadlock Immunity: Enabling Systems to Defend Against Deadlocks , 2008, OSDI.

[40]  Peng Li,et al.  Paxos Replicated State Machines as the Basis of a High-Performance Data Store , 2011, NSDI.

[41]  Yang Wang,et al.  All about Eve: Execute-Verify Replication for Multi-Core Servers , 2012, OSDI.

[42]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[43]  Robbert van Renesse,et al.  Paxos Made Moderately Complex , 2015, ACM Comput. Surv..

[44]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[45]  Dan Grossman,et al.  CoreDet: a compiler and runtime system for deterministic multithreaded execution , 2010, ASPLOS XV.

[46]  Koushik Sen,et al.  Randomized active atomicity violation detection in concurrent programs , 2008, SIGSOFT '08/FSE-16.

[47]  Koushik Sen,et al.  Race directed random testing of concurrent programs , 2008, PLDI '08.

[48]  赵阳,et al.  在Apache Web Server上实现用户认证 , 2002 .

[49]  Luis Ceze,et al.  DDOS: taming nondeterminism in distributed systems , 2013, ASPLOS '13.

[50]  Junfeng Yang,et al.  Bypassing Races in Live Applications with Execution Filters , 2010, OSDI.

[51]  Emery D. Berger,et al.  Grace: safe multithreaded programming for C/C++ , 2009, OOPSLA 2009.

[52]  Junfeng Yang,et al.  RepFrame: An Efficient and Transparent Framework for Dynamic Program Analysis , 2015, APSys.

[53]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.