Paxos made live: an engineering perspective

We describe our experience in building a fault-tolerant data-base using the Paxos consensus algorithm. Despite the existing literature in the field, building such a database proved to be non-trivial. We describe selected algorithmic and engineering problems encountered, and the solutions we found for them. Our measurements indicate that we have built a competitive system.

[1]  J. von Neumann,et al.  Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components , 1956 .

[2]  Murray Hill,et al.  Yacc: Yet Another Compiler-Compiler , 1978 .

[3]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[4]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[5]  Hanspeter Mössenböck,et al.  A Generator for Production Quality Compilers , 1990, CC.

[6]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[7]  Neeraj Suri,et al.  Advances in ULTRA-Dependable Distributed Systems , 1994 .

[8]  Russell W. Quong,et al.  ANTLR: A predicated‐LL(k) parser generator , 1995, Softw. Pract. Exp..

[9]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[10]  Butler W. Lampson,et al.  How to Build a Highly Available System Using Consensus , 1996, WDAG.

[11]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[12]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[13]  Nancy A. Lynch,et al.  Revisiting the PAXOS algorithm , 1997, Theor. Comput. Sci..

[14]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[15]  GhemawatSanjay,et al.  The Google file system , 2003 .

[16]  Marc Najork,et al.  Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.

[17]  Flaviu Cristian,et al.  Reaching agreement on processor-group membrship in synchronous distributed systems , 1991, Distributed Computing.

[18]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[19]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.