Parsimonious service replication for tolerating malicious attacks in asynchronous environments

We consider the subject of tolerance of the most severe kind of faults, namely Byzantine faults, through state machine replication in asynchronous environments such as the Internet. In Byzantine-fault-tolerant (BFT) state machine replication, state consistency among the replicas of a service is maintained by first agreeing on the order of requests to be processed (agreement or atomic broadcast phase) and then executing the requests in the agreed-upon order (execution phase). We propose a methodology for constructing asynchronous BFT replication protocols that leverage perceived normal conditions for parsimony and do not compromise correctness even when such perceptions are inaccurate. Parsimony is to be as frugal as possible for a given metric of interest. We apply this methodology to obtain parsimonious protocols that achieve efficiency in three metrics: (1) overall resource use of request execution, (2) message complexity of atomic broadcast, and (3) latency degree of atomic broadcast. We then present a suite of group management protocols that allow for the dynamic change of the composition of the replication group. Our parsimonious protocols are designed to withstand corruptions of at most one-third of the replicas and do not require the removal of suspected faulty replicas in order to provide liveness. Such a design allows for the enforcement of very selective and conservative policies regarding changes to the replication group membership. We describe the implementation of the protocols within a reusable software framework called the Component-Based Framework for Intrusion Tolerance, or CoBFIT. We also present the experimental evaluation of our protocols in the context of a representative application in both LAN and WAN (Planetlab) settings under both fault-free and controlled fault injection scenarios.

[1]  Klaus Kursawe,et al.  Optimistic Byzantine agreement , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[2]  Louise E. Moser,et al.  The SecureRing group communication system , 2001, TSEC.

[3]  Silvio Micali,et al.  A Digital Signature Scheme Secure Against Adaptive Chosen-Message Attacks , 1988, SIAM J. Comput..

[4]  William H. Sanders,et al.  Quantifying the cost of providing intrusion tolerance in group communication systems , 2002, Proceedings International Conference on Dependable Systems and Networks.

[5]  Piotr Berman,et al.  Quick Atomic Broadcast (Extended Abstract) , 1993, WDAG.

[6]  Ran Canetti,et al.  Fast asynchronous Byzantine agreement with optimal resilience , 1993, STOC.

[7]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[8]  Douglas C. Schmidt,et al.  Pattern-Oriented Software Architecture, Patterns for Concurrent and Networked Objects , 2013 .

[9]  Piotr Berman,et al.  Randomized distributed agreement revisited , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[10]  Abhinandan Das,et al.  SWIM: scalable weakly-consistent infection-style process group membership protocol , 2002, Proceedings International Conference on Dependable Systems and Networks.

[11]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[12]  Michael K. Reiter A Secure Group Membership Protocol , 1996, IEEE Trans. Software Eng..

[13]  Michael Dahlin,et al.  Small byzantine quorum systems , 2002, Proceedings International Conference on Dependable Systems and Networks.

[14]  Victor Shoup,et al.  Random Oracles in Constantinople: Practical Asynchronous Byzantine Agreement Using Cryptography , 2000, Journal of Cryptology.

[15]  Victor Shoup,et al.  Optimistic Asynchronous Atomic Broadcast , 2005, ICALP.

[16]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[17]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[18]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[19]  Yvo Desmedt,et al.  Society and Group Oriented Cryptography: A New Concept , 1987, CRYPTO.

[20]  Gene Tsudik,et al.  Admission control in peer groups , 2003, Second IEEE International Symposium on Network Computing and Applications, 2003. NCA 2003..

[21]  Mark Garland Hayden,et al.  The Ensemble System , 1998 .

[22]  André Schiper Early consensus in an asynchronous system with a weak failure detector , 1997, Distributed Computing.

[23]  Anna Lysyanskaya,et al.  Asynchronous verifiable secret sharing and proactive cryptosystems , 2002, CCS '02.

[24]  Michael K. Reiter,et al.  The Rampart Toolkit for Building High-Integrity Services , 1994, Dagstuhl Seminar on Distributed Systems.

[25]  Dahlia Malkhi,et al.  Secure reliable multicast protocols in a WAN , 2000, Distributed Computing.

[26]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[27]  Michael K. Reiter,et al.  An Architecture for Survivable Coordination in Large Distributed Systems , 2000, IEEE Trans. Knowl. Data Eng..

[28]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[29]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .

[30]  Michael O. Rabin,et al.  Randomized byzantine generals , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[31]  Louise E. Moser,et al.  Byzantine Fault Detectors for Solving Consensus , 2003, Comput. J..

[32]  Michael K. Reiter,et al.  Unreliable intrusion detection in distributed computations , 1997, Proceedings 10th Computer Security Foundations Workshop.

[33]  Christian Cachin,et al.  Secure INtrusion-Tolerant Replication on the Internet , 2002, Proceedings International Conference on Dependable Systems and Networks.

[34]  Willy Zwaenepoel,et al.  Manetho: fault tolerance in distributed systems using rollback-recovery and process replication , 1994 .

[35]  Rachid Guerraoui,et al.  Muteness Failure Detectors: Specification and Implementation , 1999, EDCC.

[36]  Sam Toueg,et al.  Fault-tolerant broadcasts and related problems , 1993 .

[37]  Michael K. Reiter,et al.  Secure agreement protocols: reliable and atomic group multicast in rampart , 1994, CCS '94.

[38]  Miguel Correia,et al.  Intrusion-Tolerant Architectures: Concepts and Design , 2002, WADS.

[39]  Danny Dolev,et al.  Early stopping in Byzantine agreement , 1990, JACM.

[40]  Michael Dahlin,et al.  Minimal Byzantine Storage , 2002, DISC.

[41]  Arun Venkataramani,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003, SOSP '03.

[42]  Miguel Castro,et al.  Using abstraction to improve fault tolerance , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[43]  Miguel Castro,et al.  BASE: using abstraction to improve fault tolerance , 2001, SOSP.

[44]  Fred B. Schneider,et al.  Distributed Trust: Supporting Fault-tolerance and Attack-tolerance , 2004 .

[45]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[46]  Michael K. Reiter,et al.  Byzantine quorum systems , 1997, STOC '97.

[47]  Victor Shoup,et al.  Practical Threshold Signatures , 2000, EUROCRYPT.

[48]  Victor Shoup,et al.  Secure and efficient asynchronous broadcast protocols : (Extended abstract) , 2001, CRYPTO 2001.

[49]  Fred B. Schneider,et al.  The primary-backup approach , 1993 .

[50]  Luis F. G. Sarmenta Sabotage-tolerance mechanisms for volunteer computing systems , 2002, Future Gener. Comput. Syst..

[51]  Christian Cachin,et al.  Distributing trust on the Internet , 2001, 2001 International Conference on Dependable Systems and Networks.