In Search of an Understandable Consensus Algorithm

Raft is a consensus algorithm for managing a replicated log. It produces a result equivalent to (multi-)Paxos, and it is as efficient as Paxos, but its structure is different from Paxos; this makes Raft more understandable than Paxos and also provides a better foundation for building practical systems. In order to enhance understandability, Raft separates the key elements of consensus, such as leader election, log replication, and safety, and it enforces a stronger degree of coherency to reduce the number of states that must be considered. Results from a user study demonstrate that Raft is easier for students to learn than Paxos. Raft also includes a new mechanism for changing the cluster membership, which uses overlapping majorities to guarantee safety.

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[3]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[4]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[5]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[6]  David J. Goodman,et al.  Personal Communications , 1994, Mobile Communications.

[7]  Butler W. Lampson,et al.  How to Build a Highly Available System Using Consensus , 1996, WDAG.

[8]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[9]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[10]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[11]  Butler W. Lampson,et al.  The ABCD's of Paxos , 2001, PODC '01.

[12]  Leslie Lamport,et al.  Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers [Book Review] , 2002, Computer.

[13]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[14]  GhemawatSanjay,et al.  The Google file system , 2003 .

[15]  Leslie Lamport,et al.  Generalized Consensus and Paxos , 2005 .

[16]  Michael Burrows,et al.  The Chubby Lock Service for Loosely-Coupled Distributed Systems , 2006, OSDI.

[17]  Jon Howell,et al.  The SMART way to migrate replicated stateful services , 2006, EuroSys.

[18]  Leslie Lamport,et al.  Fast Paxos , 2006, Distributed Computing.

[19]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[20]  David Mazières Paxos Made Practical , 2007 .

[21]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[22]  P. Kontos,et al.  Bridging theory and practice , 2007 .

[23]  Fernando Pedone,et al.  Multicoordinated Paxos , 2007, PODC '07.

[24]  Yair Amir,et al.  Paxos for System Builders , 2008 .

[25]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[26]  Keith Marzullo,et al.  Mencius: Building Efficient Replicated State Machine for WANs , 2008, OSDI.

[27]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[28]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[29]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX Annual Technical Conference.

[30]  Flavio Paiva Junqueira,et al.  Zab: High-performance broadcast for primary-backup systems , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[31]  Parag Agrawal,et al.  The case for RAMCloud , 2011, Commun. ACM.

[32]  Peng Li,et al.  Paxos Replicated State Machines as the Basis of a High-Performance Data Store , 2011, NSDI.

[33]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[34]  Barbara Liskov,et al.  Viewstamped Replication Revisited , 2012 .

[35]  Stephan Merz,et al.  TLA + Proofs , 2012, FM.

[36]  David G. Andersen,et al.  There is more consensus in Egalitarian parliaments , 2013, SOSP.

[37]  Robbert van Renesse,et al.  Paxos Made Moderately Complex , 2015, ACM Comput. Surv..