The summer of 2016 was buzzing with intern activity at the VMware Research Group (VRG), working with all the research team and with David Tennenhouse, Chief Research Officer of VMware. In this paper, we give a brief introduction to Flexible Paxos [4], one of the internship results. There were several other exciting outcomes; internships are a great way to participate in driving innovation at VMware! Flexible Paxos introduces a surprising observation concerning the foundations distributed computing. The observation revisits the basic requisites of Paxos [7, 8], Lamport’s widely adopted algorithmic foundation for fault tolerance and replication, and a pinnacle of his Turing award [1]. Since its publication, Paxos has been widely built upon in teaching, research and production systems. Paxos implements a fault tolerant state-machine among a group of nodes. At its core, Paxos uses two phases, each requires agreement from a subset of nodes (known as a quorum) to proceed. Throughout this manuscript, we will refer to the first phase as the leader election phase, and the second as the replication phase. The safety and liveness of Paxos is based on the guarantee that any two quorums will intersect. To satisfy this requirement, quorums are typically composed of any majority from a fixed set of nodes, although other quorum schemes have been proposed. In practice, we usually wish to reach agreement over a sequence of commands, not one. This is often referred to as the Multi-Paxos problem [3]. In Multi-Paxos, we use the leader election phase of Paxos to establish one node as a leader for all future commands, until it is replaced by another leader. We use the replication phase of Paxos to agree on a series of commands, one at a time. To commit a command, the leader must always communicate with at least a quorum of nodes and wait for them to accept the value. In the Flexible Paxos work, we observe that Paxos is conservative:
[1]
Moni Naor,et al.
The Load, Capacity, and Availability of Quorum Systems
,
1998,
SIAM J. Comput..
[2]
Brett D. Fleisch,et al.
The Chubby lock service for loosely-coupled distributed systems
,
2006,
OSDI '06.
[3]
Dahlia Malkhi,et al.
Flexible Paxos: Quorum Intersection Revisited
,
2016,
OPODIS.
[4]
Mahadev Konar,et al.
ZooKeeper: Wait-free Coordination for Internet-scale Systems
,
2010,
USENIX Annual Technical Conference.
[5]
Robert Griesemer,et al.
Paxos made live: an engineering perspective
,
2007,
PODC '07.
[6]
Flavio Paiva Junqueira,et al.
Zab: High-performance broadcast for primary-backup systems
,
2011,
2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).
[7]
Robbert van Renesse,et al.
Paxos Made Moderately Complex
,
2015,
ACM Comput. Surv..
[8]
Leslie Lamport,et al.
The part-time parliament
,
1998,
TOCS.
[9]
Barbara Liskov,et al.
Viewstamped Replication Revisited
,
2012
.
[10]
Leslie Lamport,et al.
Paxos Made Simple
,
2001
.
[11]
John K. Ousterhout,et al.
In Search of an Understandable Consensus Algorithm
,
2014,
USENIX Annual Technical Conference.
[12]
Fred B. Schneider,et al.
Implementing fault-tolerant services using the state machine approach: a tutorial
,
1990,
CSUR.
[13]
Barbara Liskov,et al.
Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems
,
1999,
PODC '88.