Paxos Made Parallel

Standard state-machine replication involves consensus on a sequence of totally ordered requests through, for example, the Paxos protocol. Serialized request processing seriously limits our ability to leverage prevalent multicore servers. This tension between concurrency and consistency is not inherent because the total-ordering of requests is merely a simplifying convenience that is unnecessary for consistency. Replicated state machines can be made parallel by consensus on partial-order traces, rather than on totally ordered requests, that capture causal orders in one replica execution and to be replayed in an order-preserving manner on others. The result is a new multi-core friendly replicated state-machine framework that achieves strong consistency while preserving parallelism in multi-threaded applications. On 12-core machines with hyper-threading, evaluations on typical applications show that we can scale with the number of cores, achieving up to a 16 times throughput increase over standard replicated state machines.

[1]  Ravishankar K. Iyer,et al.  Active replication of multithreaded applications , 2006, IEEE Transactions on Parallel and Distributed Systems.

[2]  Sen Hu,et al.  Efficient system-enforced deterministic parallelism , 2010, OSDI.

[3]  Koen De Bosschere,et al.  RecPlay: a fully integrated practical record/replay system , 1999, TOCS.

[4]  Junfeng Yang,et al.  Efficient deterministic multithreading through schedule relaxation , 2011, SOSP.

[5]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[6]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[7]  Peng Li,et al.  Paxos Replicated State Machines as the Basis of a High-Performance Data Store , 2011, NSDI.

[8]  Yang Wang,et al.  All about Eve: Execute-Verify Replication for Multi-Core Servers , 2012, OSDI.

[9]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[10]  LamportLeslie Time, clocks, and the ordering of events in a distributed system , 1978 .

[11]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[12]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[13]  L. Ceze,et al.  The Deterministic Execution Hammer : How Well Does it Actually Pound Nails ? , 2011 .

[14]  Michael Burrows,et al.  The Chubby Lock Service for Loosely-Coupled Distributed Systems , 2006, OSDI.

[15]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[16]  Yuanyuan Zhou,et al.  PRES: probabilistic replay with execution sketching on multiprocessors , 2009, SOSP '09.

[17]  Ion Stoica,et al.  ODR: output-deterministic replay for multicore debugging , 2009, SOSP '09.

[18]  Jason Nieh,et al.  Transparent, lightweight application execution replay on commodity multiprocessor operating systems , 2010, SIGMETRICS '10.

[19]  Satish Narayanasamy,et al.  Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism , 2010, ASPLOS 2010.

[20]  Dan Grossman,et al.  RCDC: a relaxed consistency deterministic computer , 2011, ASPLOS XVI.

[21]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[22]  Dan Grossman,et al.  CoreDet: a compiler and runtime system for deterministic multithreaded execution , 2010, ASPLOS XV.

[23]  Satish Narayanasamy,et al.  Detecting and surviving data races using complementary schedules , 2011, SOSP.

[24]  Satish Narayanasamy,et al.  DoublePlay: parallelizing sequential logging and replay , 2011, ASPLOS XVI.

[25]  Koen De Bosschere,et al.  JaRec: a portable record/replay environment for multi‐threaded Java applications , 2004, Softw. Pract. Exp..

[26]  Emery D. Berger,et al.  Dthreads: efficient deterministic multithreading , 2011, SOSP.

[27]  David A. Wood,et al.  Calvin: Deterministic or not? Free will to choose , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[28]  Frank Tip,et al.  Dynamic detection of atomic-set-serializability violations , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[29]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[30]  Luis Ceze,et al.  Deterministic Process Groups in dOS , 2010, OSDI.

[31]  Shan Lu,et al.  ConSeq: detecting concurrency bugs through sequential errors , 2011, ASPLOS XVI.

[32]  Brandon Lucia,et al.  DMP: deterministic shared memory multiprocessing , 2009, IEEE Micro.

[33]  Leslie Lamport,et al.  Generalized Consensus and Paxos , 2005 .

[34]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[35]  André Schiper,et al.  JPaxos: State machine replication based on the Paxos protocol , 2011 .

[36]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[37]  Xuezheng Liu,et al.  Usenix Association 8th Usenix Symposium on Operating Systems Design and Implementation R2: an Application-level Kernel for Record and Replay , 2022 .

[38]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .