Deconstructing paxos

The celebrated Paxos algorithm of Lamport implements a fault-tolerant deterministic service by replicating it over a distributed message-passing system. This paper presents a deconstruction of the algorithm by factoring out its fundamental algorithmic principles within two abstractions: an eventual leader election and an eventual register abstractions. In short, the leader election abstraction encapsulates the liveness property of Paxos whereas the register abstraction encapsulates its safety property. Our deconstruction is faithful in that it preserves the resilience and efficiency of the original Paxos algorithm in terms of stable storage logs, message complexity, and communication steps. In a companion paper, we show how to use our abstractions to reconstruct powerful variants of Paxos.

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[3]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[4]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[5]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[6]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[7]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[8]  Fred B. Schneider,et al.  Replication management using the state-machine approach , 1993 .

[9]  Sam Toueg,et al.  Fault-tolerant broadcasts and related problems , 1993 .

[10]  Matti A. Hiltunen,et al.  An approach to constructing modular fault-tolerant protocols , 1993, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.

[11]  Keith Marzullo,et al.  Election Vs. Consensus in Asynchronous Systems , 1995 .

[12]  Butler W. Lampson,et al.  How to Build a Highly Available System Using Consensus , 1996, WDAG.

[13]  Rachid Guerraoui,et al.  "Gamma-Accurate" Failure Detectors , 1996, WDAG.

[14]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[15]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[16]  Eli Gafni,et al.  Structured derivations of consensus algorithms for failure detectors , 1998, PODC '98.

[17]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[18]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[19]  Michel Raynal,et al.  Atomic Broadcast and Quorum-based Replication in Asynchronous Crash-Recovery Distributed Systems , 1999 .

[20]  Achour Mostéfaoui,et al.  Solving Consensus Using Chandra-Toueg's Unreliable Failure Detectors: A General Quorum-Based Approach , 1999, DISC.

[21]  Marcos K. Aguilera,et al.  Failure detection and consensus in the crash-recovery model , 2000, Distributed Computing.

[22]  Rachid Guerraoui,et al.  Indulgent algorithms (preliminary version) , 2000, PODC '00.

[23]  Nancy A. Lynch,et al.  Revisiting the PAXOS algorithm , 1997, Theor. Comput. Sci..

[24]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[25]  Sergio Rajsbaum ACM SIGACT news distributed computing column 5 , 2001, SIGA.

[26]  Dahlia Malkhi,et al.  Active Disk Paxos with infinitely many processes , 2002, PODC '02.

[27]  Rachid Guerraoui,et al.  An Efficient Universal Construction for Message-Passing Systems , 2002, DISC.

[28]  Leslie Lamport,et al.  Disk Paxos , 2003, Distributed Computing.

[29]  Christof Fetzer,et al.  On the Possibility of Consensus in Asynchronous Systems with Finite Average Response Times , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).