Rex: replication at the speed of multi-core

Standard state-machine replication involves consensus on a sequence of totally ordered requests through, for example, the Paxos protocol. Such a sequential execution model is becoming outdated on prevalent multi-core servers. Highly concurrent executions on multi-core architectures introduce non-determinism related to thread scheduling and lock contentions, and fundamentally break the assumption in state-machine replication. This tension between concurrency and consistency is not inherent because the total-ordering of requests is merely a simplifying convenience that is unnecessary for consistency. Concurrent executions of the application can be decoupled with a sequence of consensus decisions through consensus on partial-order traces, rather than on totally ordered requests, that capture the non-deterministic decisions in one replica execution and to be replayed with the same decisions on others. The result is a new multi-core friendly replicated state-machine framework that achieves strong consistency while preserving parallelism in multi-thread applications. On 12-core machines with hyper-threading, evaluations on typical applications show that we can scale with the number of cores, achieving up to 16 times the throughput of standard replicated state machines.

[1]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[2]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[3]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[4]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[5]  André Schiper,et al.  Generic Broadcast , 1999, DISC.

[6]  Koen De Bosschere,et al.  RecPlay: a fully integrated practical record/replay system , 1999, TOCS.

[7]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[8]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[9]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[10]  Ramakrishna Kotla,et al.  High throughput Byzantine fault tolerance , 2004, International Conference on Dependable Systems and Networks, 2004.

[11]  Koen De Bosschere,et al.  JaRec: a portable record/replay environment for multi‐threaded Java applications , 2004, Softw. Pract. Exp..

[12]  David García,et al.  NonStop/spl reg/ advanced architecture , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[13]  Leslie Lamport,et al.  Generalized Consensus and Paxos , 2005 .

[14]  Ravishankar K. Iyer,et al.  Active replication of multithreaded applications , 2006, IEEE Transactions on Parallel and Distributed Systems.

[15]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[16]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[17]  Xuezheng Liu,et al.  Usenix Association 8th Usenix Symposium on Operating Systems Design and Implementation R2: an Application-level Kernel for Record and Replay , 2022 .

[18]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[19]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[20]  Yuanyuan Zhou,et al.  PRES: probabilistic replay with execution sketching on multiprocessors , 2009, SOSP '09.

[21]  Ion Stoica,et al.  ODR: output-deterministic replay for multicore debugging , 2009, SOSP '09.

[22]  Satish Narayanasamy,et al.  Respec: efficient online multiprocessor replayvia speculation and external determinism , 2010, ASPLOS XV.

[23]  Zhiqiang Ma,et al.  Ad Hoc Synchronization Considered Harmful , 2010, OSDI.

[24]  Brandon Lucia,et al.  DMP: Deterministic Shared-Memory Multiprocessing , 2010, IEEE Micro.

[25]  NiehJason,et al.  Transparent, lightweight application execution replay on commodity multiprocessor operating systems , 2010 .

[26]  Satish Narayanasamy,et al.  Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism , 2010, ASPLOS 2010.

[27]  Luis Ceze,et al.  Deterministic Process Groups in dOS , 2010, OSDI.

[28]  Dan Grossman,et al.  CoreDet: a compiler and runtime system for deterministic multithreaded execution , 2010, ASPLOS XV.

[29]  Emery D. Berger,et al.  Dthreads: efficient deterministic multithreading , 2011, SOSP.

[30]  Dan Grossman,et al.  RCDC: a relaxed consistency deterministic computer , 2011, ASPLOS XVI.

[31]  Junfeng Yang,et al.  Efficient deterministic multithreading through schedule relaxation , 2011, SOSP.

[32]  L. Ceze,et al.  The Deterministic Execution Hammer : How Well Does it Actually Pound Nails ? , 2011 .

[33]  Satish Narayanasamy,et al.  DoublePlay: parallelizing sequential logging and replay , 2011, ASPLOS XVI.

[34]  Satish Narayanasamy,et al.  Detecting and surviving data races using complementary schedules , 2011, SOSP.

[35]  André Schiper,et al.  JPaxos: State machine replication based on the Paxos protocol , 2011 .

[36]  David A. Wood,et al.  Calvin: Deterministic or not? Free will to choose , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[37]  Peng Li,et al.  Paxos Replicated State Machines as the Basis of a High-Performance Data Store , 2011, NSDI.

[38]  Yang Wang,et al.  All about Eve: Execute-Verify Replication for Multi-Core Servers , 2012, OSDI.

[39]  Sen Hu,et al.  Efficient system-enforced deterministic parallelism , 2010, OSDI.

[40]  Pawel T. Wojciechowski,et al.  Hybrid Replication: State-Machine-Based and Deferred-Update Replication Schemes Combined , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.