How resilient are distributed f fault/intrusion-tolerant systems?

Fault-tolerant protocols, asynchronous and synchronous alike, make stationary fault assumptions: only a fraction f of the total n nodes may fail. Whilst a synchronous protocol is expected to have a bounded execution time, an asynchronous one may execute for an arbitrary amount of time, possibly sufficient for f+1 nodes to fail. This can compromise the safety of the protocol and ultimately the safety of the system. Recent papers propose asynchronous protocols that can tolerate any number of faults over the lifetime of the system, provided that at most f nodes become faulty during a given interval. This is achieved through the so-called proactive recovery, which consists of periodically rejuvenating the system. Proactive recovery in asynchronous systems, though a major breakthrough, has some limitations which had not been identified before. In this paper, we introduce a system model expressive enough to represent these problems which remained in oblivion with the classical models. We introduce the predicate exhaustion-safe, meaning freedom from exhaustion-failures. Based on it, we predict the extent to which fault/intrusion-tolerant distributed systems (synchronous and asynchronous) can be made to work correctly. Namely, our model predicts the impossibility of guaranteeing correct behavior of asynchronous proactive recovery systems as exist today. To prove our point, we give an example of how these problems impact an existing fault/intrusion-tolerant distributed system, the CODEX system, and having identified the problem, we suggest one (certainly not the only) way to tackle it.

[1]  Sam Toueg,et al.  A Modular Approach to Fault-Tolerant Broadcasts and Related Problems , 1994 .

[2]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[3]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[4]  Anna Lysyanskaya,et al.  Asynchronous verifiable secret sharing and proactive cryptosystems , 2002, CCS '02.

[5]  Tal Rabin,et al.  Secure distributed storage and retrieval , 1997, Theor. Comput. Sci..

[6]  Hugo Krawczyk,et al.  Proactive Secret Sharing Or: How to Cope With Perpetual Leakage , 1995, CRYPTO.

[7]  Paulo Veríssimo,et al.  Intrusion-Tolerant Middleware: the MAFTIA approach , 2004 .

[8]  Fred B. Schneider,et al.  CODEX: a robust and secure secret distribution system , 2004, IEEE Transactions on Dependable and Secure Computing.

[9]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[10]  Sam Toueg,et al.  Asynchronous consensus and broadcast protocols , 1985, JACM.

[11]  Robbert van Renesse,et al.  APSS: proactive secret sharing in asynchronous systems , 2005, TSEC.

[12]  Miguel Correia,et al.  The Design of a COTSReal-Time Distributed Security Kernel , 2002, EDCC.

[13]  Daniel P. Siewiorek,et al.  Reliable computer systems (2nd ed.): design and evaluation , 1992 .

[14]  Santosh K. Shrivastava,et al.  Reliable Computer Systems , 1985, Texts and Monographs in Computer Science.

[15]  Antonio Casimiro,et al.  The Timely Computing Base Model and Architecture , 2002, IEEE Trans. Computers.

[16]  Paulo Veríssimo,et al.  Distributed Systems for System Architects , 2001, Advances in Distributed Computing and Middleware.

[17]  Markus Jakobsson,et al.  Proactive public key and signature systems , 1997, CCS '97.

[18]  Flaviu Cristian,et al.  The Timed Asynchronous Distributed System Model , 1998, IEEE Trans. Parallel Distributed Syst..

[19]  Paulo Veríssimo Uncertainty and predictability: can they be reconciled? , 2003 .

[20]  Rafail Ostrovsky,et al.  How to withstand mobile virus attacks (extended abstract) , 1991, PODC '91.

[21]  Paulo Veríssimo,et al.  Uncertainty and Predictability: Can They Be Reconciled? , 2003, Future Directions in Distributed Computing.

[22]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[23]  Fred B. Schneider,et al.  COCA: a secure distributed online certification authority , 2002 .