Geo-replication becomes increasingly important for modern planetary scale distributed systems, yet it comes with a specific challenge: latency, bounded by the speed of light. In particular, clients of a geo-replicated system must communicate with a leader which must in turn communicate with other replicas: wrong selection of a leader may result in unnecessary round-trips across the globe. Classical protocols such as celebrated Paxos, have a single leader making them unsuitable for serving widely dispersed clients. To address this issue, several all-leader geo-replication protocols have been proposed recently, in which every replica acts as a leader. However, because these protocols require coordination among all replicas, commiting a client's request at some replica may incure the so-called "delayed commit" problem, which can introduce even a higher latency than a classical single-leader majority-based protocol such as Paxos.
In this paper, we argue that the "right" choice of the number of leaders in a geo-replication protocol depends on a given replica configuration and propose Droopy, an optimization for state machine replication protocols that explores the space between single-leader and all-leader by dynamically reconfiguring the leader set. We implement Droopy on top of Clock-RSM, a state-of-the-art all-leader protocol. Our evaluation on Amazon EC2 shows that, under typical imbalanced workloads, Droopy-enabled Clock-RSM efficiently reduces latency compared to native Clock-RSM, whereas in other cases the latency is the same as that of the native Clock-RSM.
[1]
Leslie Lamport,et al.
Generalized Consensus and Paxos
,
2005
.
[2]
Fernando Pedone,et al.
Clock-RSM: Low-Latency Inter-datacenter State Machine Replication Using Loosely Synchronized Physical Clocks
,
2014,
2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.
[3]
Louise E. Moser,et al.
The Totem single-ring ordering and membership protocol
,
1995,
TOCS.
[4]
Leslie Lamport,et al.
Vertical paxos and primary-backup replication
,
2009,
PODC '09.
[5]
David G. Andersen,et al.
Paxos Quorum Leases: Fast Reads Without Sacrificing Writes
,
2014,
SoCC.
[6]
Fernando Pedone,et al.
Ring Paxos: A high-throughput atomic broadcast protocol
,
2010,
2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).
[7]
Nancy A. Lynch,et al.
Impossibility of distributed consensus with one faulty process
,
1983,
PODS '83.
[8]
Leslie Lamport,et al.
Fast Paxos
,
2006,
Distributed Computing.
[9]
Brian F. Cooper.
Spanner: Google's globally-distributed database
,
2013,
SYSTOR '13.
[10]
Robert Griesemer,et al.
Paxos made live: an engineering perspective
,
2007,
PODC '07.
[11]
Leslie Lamport,et al.
The part-time parliament
,
1998,
TOCS.
[12]
Keith Marzullo,et al.
Mencius: Building Efficient Replicated State Machine for WANs
,
2008,
OSDI.
[13]
David G. Andersen,et al.
There is more consensus in Egalitarian parliaments
,
2013,
SOSP.