Intrusion-Tolerant Architectures: Concepts and Design

There is a significant body of research on distributed computing architectures, methodologies and algorithms, both in the fields of fault tolerance and security. Whilst they have taken separate paths until recently, the problems to be solved are of similar nature. In classical dependability, fault tolerance has been the workhorse of many solutions. Classical security-related work has on the other hand privileged, with few exceptions, intrusion prevention. Intrusion tolerance (IT) is a new approach that has slowly emerged during the past decade, and gained impressive momentum recently. Instead of trying to prevent every single intrusion, these are allowed, but tolerated: the system triggers mechanisms that prevent the intrusion from generating a system security failure. The paper describes the fundamental concepts behind IT, tracing their connection with classical fault tolerance and security. We discuss the main strategies and mechanisms for architecting IT systems, and report on recent advances on distributed IT system architectures.

[1]  Paulo Veríssimo,et al.  Uncertainty and Predictability: Can They Be Reconciled? , 2003, Future Directions in Distributed Computing.

[2]  Christian Cachin,et al.  Secure INtrusion-Tolerant Replication on the Internet , 2002, Proceedings International Conference on Dependable Systems and Networks.

[3]  Sam Toueg,et al.  A Modular Approach to Fault-Tolerant Broadcasts and Related Problems , 1994 .

[4]  John E. Dobson,et al.  Building Reliable Secure Computing Systems out of Unreliable Insecure Components , 1986, IEEE Symposium on Security and Privacy.

[5]  Michael K. Reiter,et al.  The Rampart Toolkit for Building High-Integrity Services , 1994, Dagstuhl Seminar on Distributed Systems.

[6]  Antonio Casimiro,et al.  CesiumSpray: a Precise and Accurate Global Time Service for Large-scale Systems , 1997, Real-Time Systems.

[7]  Michael K. Reiter,et al.  Dynamic byzantine quorum systems , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[8]  Cecília M. F. Rubira,et al.  Fault tolerance in concurrent object-oriented software through coordinated error recovery , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[9]  Miguel Correia,et al.  Service and Protocol Architecture for the MAFTIA Middleware , 2001 .

[10]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[11]  Michael K. Reiter,et al.  Persistent objects in the Fleet system , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[12]  Dhiraj K. Pradhan,et al.  Consensus With Dual Failure Modes , 1991, IEEE Trans. Parallel Distributed Syst..

[13]  Yongdae Kim,et al.  Exploring robustness in group key agreement , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[14]  I. Bey,et al.  Delta-4: A Generic Architecture for Dependable Distributed Computing , 1991, Research Reports ESPRIT.

[15]  Paulo Veríssimo Uncertainty and predictability: can they be reconciled? , 2003 .

[16]  Robert Balzer,et al.  Document integrity through mediated interfaces , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[17]  Gene Tsudik,et al.  New multiparty authentication services and key agreement protocols , 2000, IEEE Journal on Selected Areas in Communications.

[18]  Birgit Pfitzmann,et al.  A model for asynchronous reactive systems and its application to secure message transmission , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[19]  onio Casimiro CesiumSpray : a Precise and Accurate Global Clock Service for Large-scale Systems , 1997 .

[20]  Brian Randell,et al.  Fundamental Concepts of Dependability , 2000 .

[21]  Hein Meling,et al.  Toward self-organizing, self-repairing and resilient distributed systems , 2003 .

[22]  Bruno Dutertre,et al.  Intrusion-tolerant Enclaves , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[23]  Miguel Correia,et al.  Efficient Byzantine-resilient reliable multicast on a hybrid failure model , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[24]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[25]  D. Powell,et al.  The Delta-4 Approach to Dependability in Open Distributed Computing Systems , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[26]  Sadie Creese,et al.  Conceptual Model and Architecture of MAFTIA , 2003 .

[27]  Yves Deswarte,et al.  Intrusion tolerance in distributed computing systems , 1991, Proceedings. 1991 IEEE Computer Society Symposium on Research in Security and Privacy.

[28]  Ran Canetti,et al.  Proactive Security: Long-term protection against break-ins , 1997 .

[29]  Fred B. Schneider,et al.  COCA: a secure distributed online certification authority , 2002 .

[30]  Yves Deswarte,et al.  An authorization scheme for distributed object systems , 1997, Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No.97CB36097).

[31]  R. Canetti,et al.  Proactive Security : Long-term Protection Against Break , 1997 .

[32]  Michael Gertz,et al.  THE WILLOW SURVIVABILITY ARCHITECTURE , 2001 .

[33]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[34]  Antonio Casimiro,et al.  The timely computing base: Timely actions in the presence of uncertain timeliness , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[35]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[36]  Hervé Debar,et al.  Aggregation and Correlation of Intrusion-Detection Alerts , 2001, Recent Advances in Intrusion Detection.

[37]  Louise E. Moser,et al.  The SecureRing group communication system , 2001, TSEC.

[38]  Andrew A. Chien,et al.  Breaking the barriers: high performance security for high performance computing , 2002, NSPW '02.

[39]  Paulo Veríssimo,et al.  Distributed Systems for System Architects , 2001, Advances in Distributed Computing and Middleware.

[40]  Sam Toueg,et al.  Asynchronous consensus and broadcast protocols , 1985, JACM.

[41]  John E. Dobson,et al.  Building Reliable Secure Computing Systems Out Of Unreliable Insecure Components , 1986, 1986 IEEE Symposium on Security and Privacy.

[42]  Miguel Correia,et al.  The Design of a COTS Real-Time Distributed Security Kernel (Extended Version) , 2001 .

[43]  David Powell,et al.  A fault- and intrusion- tolerant file system , 1985 .

[44]  Matti A. Hiltunen,et al.  Enhancing survivability of security services using redundancy , 2001, 2001 International Conference on Dependable Systems and Networks.