Reducing the Energy Footprint of a Distributed Consensus Algorithm

The Raft consensus algorithm is a new distributed consensus algorithm that is both easier to understand and more straightforward to implement than the older Paxos algorithm. Its major limitation is its high energy footprint. As it relies on majority consensus voting for deciding when to commit an update, Raft requires five participants to protect against two simultaneous failures. We propose two methods for reducing this huge energy footprint. Our first proposal consists of adjusting Raft quorums in a way that would allow updates to proceed with as few as two servers while requiring a larger quorum for electing a new leader. Our second proposal consists of replacing one or two of the five Raft servers with witnesses, that is, lightweight servers that maintain the same metadata as other servers but hold no data and can therefore run on very low-power hosts. We show that these substitutions have little impact on the cluster availability but very different impacts on the risks of incurring a data loss.

[1]  Jon Crowcroft,et al.  Raft Refloated: Do We Have Consensus? , 2015, OPSR.

[2]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[3]  Ethan L. Miller,et al.  Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage , 2008, FAST.

[4]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[5]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[6]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[7]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[8]  Calton Pu,et al.  Regeneration of Replicated Objects: A Technique and Its Eden Implementation , 1986, ICDE.

[9]  Jehan-François Pâris,et al.  Voting with Witnesses: A Constistency Scheme for Replicated Files , 1986, ICDCS.

[10]  Gareth Halfacree,et al.  Raspberry Pi User Guide , 2012 .

[11]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[12]  Michael Burrows,et al.  The Chubby Lock Service for Loosely-Coupled Distributed Systems , 2006, OSDI.

[13]  H ThomasRobert A Majority consensus approach to concurrency control for multiple copy databases , 1979 .

[14]  Walter A. Burkhard,et al.  Consistency and recovery control for replicated files , 1985, SOSP 1985.

[15]  Leslie Lamport,et al.  Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. , 1984, TOPL.

[16]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[17]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[18]  Walter A. Burkhard,et al.  Consistency and recovery control for replicated files , 1985, SOSP '85.

[19]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.

[20]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.