Resilient Intrusion Tolerance through Proactive and Reactive Recovery

Previous works have studied how to use proactive recovery to build intrusion-tolerant replicated systems that are resilient to any number of faults, as long as recoveries are faster than an upper-bound on fault production assumed at system deployment time. In this paper, we propose a complementary approach that combines proactive recovery with services that allow correct replicas to react and recover replicas that they detect or suspect to be compromised. One key feature of our proactive-reactive recovery approach is that, despite recoveries, it guarantees the availability of the minimum amount of system replicas necessary to sustain system's correct operation. We design a proactive-reactive recovery service based on a hybrid distributed system model and show, as a case study, how this service can effectively be used to augment the resilience of an intrusion-tolerant firewall adequate for the protection of critical infrastructures.

[1]  Noah Treuhaft,et al.  Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies , 2002 .

[2]  Yennun Huang,et al.  Software Implemented Fault Tolerance Technologies and Experience , 1993, FTCS.

[3]  Robbert van Renesse,et al.  COCA: a secure distributed online certification authority , 2002, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[4]  DahlinMike,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003 .

[5]  Miguel Correia,et al.  Intrusion-Tolerant Protection for Critical Infrastructures , 2007 .

[6]  Paulo Veríssimo,et al.  Distributed Systems for System Architects , 2001, Advances in Distributed Computing and Middleware.

[7]  尚弘 島影 National Institute of Standards and Technologyにおける超伝導研究及び生活 , 2001 .

[8]  Rafail Ostrovsky,et al.  How to withstand mobile virus attacks (extended abstract) , 1991, PODC '91.

[9]  Miguel Correia,et al.  Resilient Intrusion Tolerance through Proactive and Reactive Recovery , 2007 .

[10]  Christian Cachin,et al.  Secure INtrusion-Tolerant Replication on the Internet , 2002, Proceedings International Conference on Dependable Systems and Networks.

[11]  Paulo Veríssimo,et al.  How resilient are distributed f fault/intrusion-tolerant systems? , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[12]  Kishor S. Trivedi,et al.  Analysis of software rejuvenation using Markov Regenerative Stochastic Petri Net , 1995, Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95.

[13]  Andrea Bondavalli,et al.  Hidden Markov Models as a Support for Diagnosis: Formalization of the Problem and Synthesis of the Solution , 2006, 2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06).

[14]  Fred B. Schneider,et al.  CODEX: a robust and secure secret distribution system , 2004, IEEE Transactions on Dependable and Secure Computing.

[15]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[16]  B SchneiderFred Implementing fault-tolerant services using the state machine approach: a tutorial , 1990 .

[17]  Miguel Correia,et al.  Intrusion-Tolerant Architectures: Concepts and Design , 2002, WADS.

[18]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[19]  Miguel Correia,et al.  CRUTIAL: The Blueprint of a Reference Critical Information Infrastructure Architecture , 2006, CRITIS.

[20]  Paulo Veríssimo,et al.  Hidden problems of asynchronous proactive recovery , 2007 .

[21]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[22]  William H. Sanders,et al.  Automatic model-driven recovery in distributed systems , 2005, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05).

[23]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[24]  Todd L. Heberlein,et al.  Network intrusion detection , 1994, IEEE Network.

[25]  Lui Sha,et al.  Aperiodic task scheduling for Hard-Real-Time systems , 2006, Real-Time Systems.

[26]  Paulo Veríssimo,et al.  On the Resilience of Intrusion-Tolerant Distributed Systems , 2006 .

[27]  Andreas Haeberlen,et al.  The Case for Byzantine Fault Detection , 2006, HotDep.

[28]  Rachid Guerraoui,et al.  Muteness Failure Detectors: Specification and Implementation , 1999, EDCC.

[29]  Michel Raynal,et al.  Consensus in Byzantine asynchronous systems , 2003, J. Discrete Algorithms.

[30]  Rachid Guerraoui,et al.  Encapsulating Failure Detection: From Crash to Byzantine Failures , 2002, Ada-Europe.

[31]  Paulo Veríssimo,et al.  Travelling through wormholes: a new look at distributed systems models , 2006, SIGA.

[32]  Miguel Correia,et al.  How Practical Are Intrusion-Tolerant Distributed Systems? , 2006 .