Efficient and Deterministic Scheduling for Parallel State Machine Replication

Many services used in large scale web applications should be able to tolerate faults without impacting their performance. State machine replication is a well-known approach to implementing fault-tolerant services, providing high availability and strong consistency. To boost the performance of state machine replication, recent proposals have introduced parallel execution of commands. In parallel state machine replication, incoming commands may or may not depend on other commands that are waiting for execution. Although dependent commands must be processed in the same relative order at every replica to avoid inconsistencies, independent commands can be executed in parallel and benefit from multi-core architectures. Since many application workloads are mostly composed of independent commands, these parallel models promise high throughput without sacrificing strong consistency. The efficient execution of commands in such environments, however, requires effective scheduling strategies. Existing approaches rely on dependency tracking based on pairwise comparison between commands, which introduces scheduling contention. In this paper, we propose a new and highly efficient scheduler for parallel state machine replication. Our scheduler considers batches of commands, instead of commands individually. Moreover, each batch of commands is augmented with a compact data structure that encodes commands information needed to the dependency analysis. We show, by means of experimental evaluation, that our technique outperforms schedulers for parallel state machine replication by a fairly large margin.

[1]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[2]  Dong Zhou,et al.  Rex: replication at the speed of multi-core , 2014, EuroSys '14.

[3]  Junfeng Yang,et al.  Paxos made transparent , 2015, SOSP.

[4]  Ramakrishna Kotla,et al.  High throughput Byzantine fault tolerance , 2004, International Conference on Dependable Systems and Networks, 2004.

[5]  Fernando Pedone,et al.  Rethinking State-Machine Replication for Parallelism , 2013, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[6]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[7]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[8]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[9]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[10]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[11]  Fernando Pedone,et al.  Building global and scalable systems with atomic multicast , 2014, Middleware.

[12]  Luis Ceze,et al.  DDOS: taming nondeterminism in distributed systems , 2013, ASPLOS '13.

[13]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[14]  Marcin Paprzycki,et al.  Distributed Computing: Fundamentals, Simulations and Advanced Topics , 2001, Scalable Comput. Pract. Exp..

[15]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[16]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[17]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[18]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[19]  Tobias Distler,et al.  Storyboard: Optimistic Deterministic Multithreading , 2010, HotDep.

[20]  GhemawatSanjay,et al.  The Google file system , 2003 .

[21]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[22]  David G. Andersen,et al.  There is more consensus in Egalitarian parliaments , 2013, SOSP.

[23]  Yang Wang,et al.  All about Eve: Execute-Verify Replication for Multi-Core Servers , 2012, OSDI.

[24]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.