Hardened Paxos through Consistency Validation

Due to the emergent adoption of distributed systems when building applications, demand for reliability and availability has increased. These properties can be achieved through replication techniques using middleware algorithms that must be capable of tolerating faults. Certain faults such as arbitrary faults, however, may be more difficult to tolerate, resulting in more complex and resource intensive algorithms that end up being not so practical to use. We propose and experiment with the use of consistency validation techniques to harden a benign fault-tolerant Paxos, thus being able to detect and tolerate non-malicious arbitrary faults.

[1]  Robbert van Renesse,et al.  Vive La Différence: Paxos vs. Viewstamped Replication vs. Zab , 2013, IEEE Transactions on Dependable and Secure Computing.

[2]  Rachid Guerraoui,et al.  Fault-Tolerance by Replication in Distributed Systems , 1996, Ada-Europe.

[3]  Spencer W. Ng,et al.  Disk scrubbing in large archival storage systems , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[4]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[5]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[6]  Miguel Correia,et al.  Practical Hardening of Crash-Tolerant Systems , 2012, USENIX Annual Technical Conference.

[7]  G. M. D. Vieira,et al.  Implementation of an Object-Oriented Specification for Active Replication Using Consensus , 2010 .

[8]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[9]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[10]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[11]  Christof Fetzer,et al.  Automatically Tolerating Arbitrary Faults in Non-malicious Settings , 2013, 2013 Sixth Latin-American Symposium on Dependable Computing.

[12]  G. M. D. Vieira,et al.  Treplica : Ubiquitous Replication , 2007 .

[13]  G. Edward Suh,et al.  Incremental Multiset Hash Functions and Their Application to Memory Integrity Checking , 2003, ASIACRYPT.

[14]  Leslie Lamport,et al.  Byzantizing Paxos by Refinement , 2011, DISC.

[15]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[16]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[17]  Pramod Bhatotia,et al.  Reliable data-center scale computations , 2010, LADIS '10.

[18]  Rachid Guerraoui,et al.  Introduction to Reliable and Secure Distributed Programming , 2011 .