Architecting Dependable Systems

There is a significant body of research on distributed computing architectures, methodologies and algorithms, both in the fields of fault tolerance and security. Whilst they have taken separate paths until recently, the problems to be solved are of similar nature. In classical dependability, fault tolerance has been the workhorse of many solutions. Classical security-related work has on the other hand privileged, with few exceptions, intrusion prevention. Intrusion tolerance (IT) is a new approach that has slowly emerged during the past decade, and gained impressive momentum recently. Instead of trying to prevent every single intrusion, these are allowed, but tolerated: the system triggers mechanisms that prevent the intrusion from generating a system security failure. The paper describes the fundamental concepts behind IT, tracing their connection with classical fault tolerance and security. We discuss the main strategies and mechanisms for architecting IT systems, and report on recent advances on distributed IT system architectures.

[1]  Rick Kazman,et al.  Toward a discipline of scenario‐based architectural engineering , 2000, Ann. Softw. Eng..

[2]  Robert J. Allen A formal approach to software architecture , 1997 .

[3]  Beth A. Schroeder On-Line Monitoring: A Tutorial , 1995, Computer.

[4]  Frank van der Linden,et al.  Software Product Families in Europe: The Esaps & Café Projects , 2002, IEEE Softw..

[5]  Alexandra Poulovassilis,et al.  An event-condition-action language for XML , 2002, WWW '02.

[6]  Marija Mikic-Rakic,et al.  Increasing the confidence in off-the-shelf components: a software connector-based approach , 2001, SSR '01.

[7]  Karama Kanoun,et al.  Dependability Evaluation From Functional to Structural Modelling , 2001 .

[8]  Frank van der Linden Engineering Software Architectures, Processes and Platforms for System Families - ESAPS Overview , 2002, SPLC.

[9]  Fred B. Schneider,et al.  COCA: a secure distributed online certification authority , 2002 .

[10]  Yves Deswarte,et al.  An authorization scheme for distributed object systems , 1997, Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No.97CB36097).

[11]  David S. Wile AML: an Architecture Meta-Language , 1999, 14th IEEE International Conference on Automated Software Engineering.

[12]  Dániel Varró,et al.  Metamodeling Mathematics: A Precise and Visual Framework for Describing Semantics Domains of UML Models , 2002, UML.

[13]  J. Xu,et al.  Toward an object-oriented approach to software fault tolerance , 1994, Proceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems.

[14]  Barry Boehm,et al.  Detecting architectural mismatches during systems composition , 1998 .

[15]  Sadie Creese,et al.  Conceptual Model and Architecture of MAFTIA , 2003 .

[16]  David Garlan,et al.  Reconciling the needs of architectural description with object-modeling notations , 2000, Sci. Comput. Program..

[17]  Robert Rasmussen,et al.  Goal-based fault tolerance for space systems using the mission data system , 2001, 2001 IEEE Aerospace Conference Proceedings (Cat. No.01TH8542).

[18]  Yves Deswarte,et al.  Intrusion tolerance in distributed computing systems , 1991, Proceedings. 1991 IEEE Computer Society Symposium on Research in Security and Privacy.

[19]  Maurice Herlihy,et al.  Specifying Graceful Degradation , 1991, IEEE Trans. Parallel Distributed Syst..

[20]  Andrea Bondavalli,et al.  Dependability Modeling and Analysis of Complex Control Systems: An Application to Railway Interlocking , 1996, EDCC.

[21]  Paulo Veríssimo Uncertainty and predictability: can they be reconciled? , 2003 .

[22]  Valérie Issarny,et al.  Developing Dependable Systems Using Software Architecture , 1999, WICSA.

[23]  Clinton L. Jeffery,et al.  Program monitoring and visualization - a exploratory approach , 2011 .

[24]  Martin Steppler,et al.  Performance analysis of communication systems formally specified in SDL , 1998, WOSP '98.

[25]  Kishor S. Trivedi,et al.  SPNP: stochastic Petri net package , 1989, Proceedings of the Third International Workshop on Petri Nets and Performance Models, PNPM89.

[26]  Bernd Mohr,et al.  Distributed Performance Monitoring: Methods, Tools, and Applications , 1994, IEEE Trans. Parallel Distributed Syst..

[27]  Marco Ajmone Marsan,et al.  A LOTOS extension for the performance analysis of distributed systems , 1994, TNET.

[28]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[29]  Konrad Slind,et al.  Monitoring distributed systems , 1987, TOCS.

[30]  Victoria Stavridou,et al.  Provably dependable software architectures , 1998, ISAW '98.

[31]  Dhiraj K. Pradhan,et al.  Consensus With Dual Failure Modes , 1991, IEEE Trans. Parallel Distributed Syst..

[32]  Susann C. Allmaier,et al.  PANDA -- Petri Net Analysis and Design Assistant , 1997 .

[33]  Nenad Medvidovic,et al.  Component-based perspective on software mismatch detection and resolution , 2000, IEE Proc. Softw..

[34]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[35]  Rushby John,et al.  Partitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance , 1999 .

[36]  Dániel Varró,et al.  VIATRA - visual automated transformations for formal verification and validation of UML models , 2002, Proceedings 17th IEEE International Conference on Automated Software Engineering,.

[37]  Chang-Yu Wang,et al.  Integration of Specification for Modeling and Specification for System Design , 1993, Application and Theory of Petri Nets.

[38]  Yennun Huang,et al.  Software Implemented Fault Tolerance Technologies and Experience , 1993, FTCS.

[39]  Stephen S. Yau,et al.  An approach to distributed component-based real-time application software development , 1998, Proceedings First International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '98).

[40]  Sally C. Johnson Reliability analysis of large, complex systems using ASSIST , 1988 .

[41]  Jean Arlat,et al.  Definition and analysis of hardware- and software-fault-tolerant architectures , 1990, Computer.

[42]  Karsten Schwan,et al.  Application-Dependent Dynamic Monitoring of Distributed and Parallel Systems , 1993, IEEE Trans. Parallel Distributed Syst..

[43]  István Majzik,et al.  Towards Dependability Modeling of FT-CORBA Architectures , 2002, EDCC.

[44]  B. Boehm Control. Honeywell Technology Center, Minneapolis, MN. [GACE95] C. Gacek, A. Abd-Allah, B. Clark, B. Boehm. "On the Definition of Software Architecture", , 1996 .

[45]  Glenn Reeves,et al.  Software architecture themes in JPL's Mission Data System , 1999, 2000 IEEE Aerospace Conference. Proceedings (Cat. No.00TH8484).

[46]  Robert DeLine,et al.  A catalog of techniques for resolving packaging mismatch , 1999, SSR '99.

[47]  Swapna S. Gokhale,et al.  An analytical approach to architecture-based software reliability prediction , 1998, Proceedings. IEEE International Computer Performance and Dependability Symposium. IPDS'98 (Cat. No.98TB100248).

[48]  Mary Shaw,et al.  Abstractions for Software Architecture and Tools to Support Them , 1995, IEEE Trans. Software Eng..

[49]  John F. Meyer,et al.  On Evaluating the Performability of Degradable Computing Systems , 1980, IEEE Transactions on Computers.

[50]  Ehab Al-Shaer,et al.  HiFi: a new monitoring architecture for distributed systems management , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[51]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[52]  Alexander L. Wolf,et al.  Automating Process Discovery through Event-Data Analysis , 1995, 1995 17th International Conference on Software Engineering.

[53]  Panos Constantopoulos,et al.  Research and Advanced Technology for Digital Libraries , 2001, Lecture Notes in Computer Science.

[54]  Michael Dahlin,et al.  End-to-end WAN service availability , 2001, TNET.

[55]  Felix C. Freiling,et al.  Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments , 1999, ACM Comput. Surv..

[56]  José Luiz Fiadeiro,et al.  Feature Modeling and Composition with Coordination Contracts , 2001, FICS.

[57]  Yongdae Kim,et al.  Exploring robustness in group key agreement , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[58]  Mário M. Freire,et al.  High-Speed Networks and Multimedia Communications , 2003 .

[59]  Shi-Kuo Chang,et al.  Advances in Software Engineering and Knowledge Engineering , 1993, Series on Software Engineering and Knowledge Engineering.

[60]  Cecília M. F. Rubira,et al.  A comparative study of exception handling mechanisms for building dependable object-oriented software , 2001, J. Syst. Softw..

[61]  Miguel Correia,et al.  Efficient Byzantine-resilient reliable multicast on a hybrid failure model , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[62]  Sam Toueg,et al.  The Cost of Graceful Degradation for Omission Failures , 1999, Inf. Process. Lett..

[63]  Dieter Hogrefe,et al.  Hierarchical Performance Evaluation Based on Formally Specified Communication Protocols , 1991, IEEE Trans. Computers.

[64]  Paola Inverardi,et al.  Formal Specification and Analysis of Software Architectures Using the Chemical Abstract Machine Model , 1995, IEEE Trans. Software Eng..

[65]  Paulo Veríssimo,et al.  Uncertainty and Predictability: Can They Be Reconciled? , 2003, Future Directions in Distributed Computing.

[66]  Mary Shaw,et al.  Software architecture - perspectives on an emerging discipline , 1996 .

[67]  C LuckhamDavid,et al.  Specification and Analysis of System Architecture Using Rapide , 1995 .

[68]  Naranker Dulay,et al.  Specifying Distributed Software Architectures , 1995, ESEC.

[69]  Sam Toueg,et al.  Asynchronous consensus and broadcast protocols , 1985, JACM.

[70]  William H. Sanders,et al.  Intrusion Tolerance Approaches in ITUA , 2001 .

[71]  Kishor S. Trivedi,et al.  Reliability estimation of fault-tolerant systems: tools and techniques , 1990, Computer.

[72]  Richard N. Taylor,et al.  A language and environment for architecture-based software development and evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[73]  Joanne Bechta Dugan,et al.  Automatic synthesis of dynamic fault trees from UML system models , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[74]  David Garlan,et al.  Acme: an architecture description interchange language , 1997, CASCON.

[75]  Karama Kanoun,et al.  Dependability Evaluation of a Distributed Shared Memory Multiprocessor System , 1999, EDCC.

[76]  Richard N. Taylor,et al.  A Component- and Message-Based Architectural Style for GUI Software , 1995, 1995 17th International Conference on Software Engineering.

[77]  Wolfgang Karl,et al.  OpenSESAME: an intuitive dependability modeling environment supporting inter-component dependencies , 2001, Proceedings 2001 Pacific Rim International Symposium on Dependable Computing.

[78]  Brian Randell,et al.  The Evolution of the Recovery Block Concept , 1994 .

[79]  Ajmone MarsanMarco,et al.  A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems , 1984 .

[80]  Birgit Pfitzmann,et al.  A model for asynchronous reactive systems and its application to secure message transmission , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[81]  David F. Redmiles,et al.  Extracting usability information from user interface events , 2000, CSUR.

[82]  Michael K. Reiter,et al.  Persistent objects in the Fleet system , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[83]  C. Wohlin,et al.  Performance analysis in the early design of software , 1989 .

[84]  Marco Ajmone Marsan,et al.  On Petri nets with deterministic and exponentially distributed firing times , 1986, European Workshop on Applications and Theory of Petri Nets.

[85]  Peyman Oreizy,et al.  Reuse of Off-the-Shelf Components in C2-Style Architectures , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[86]  Alan W. Brown,et al.  The Current State , 2016 .

[87]  Dániel Varró,et al.  Designing the automatic transformation of visual languages , 2002, Sci. Comput. Program..

[88]  Charles P. Shelton,et al.  Using Architectural Properties to Model and Measure System-Wide Graceful Degradation , 2002 .

[89]  Valérie Issarny,et al.  Automating the performance and reliability analysis of enterprise information systems , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).

[90]  Debra J. Richardson,et al.  Analyzing software architectures with Argus-I , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[91]  Lui Sha,et al.  Evolving dependable real-time systems , 1996, 1996 IEEE Aerospace Applications Conference. Proceedings.

[92]  Richard N. Taylor,et al.  A Classification and Comparison Framework for Software Architecture Description Languages , 2000, IEEE Trans. Software Eng..

[93]  Richard T. Snodgrass,et al.  A relational approach to monitoring complex systems , 1988, TOCS.

[94]  Andrea Bondavalli,et al.  Automatic dependability analysis for supporting design decisions in UML , 1999, Proceedings 4th IEEE International Symposium on High-Assurance Systems Engineering.

[95]  Gerard J. Holzmann,et al.  The SPIN Model Checker , 2003 .

[96]  H. Kopetz,et al.  Dependability: Basic Concepts and Terminology , 1992, Dependable Computing and Fault-Tolerant Systems.

[97]  Michael Gertz,et al.  THE WILLOW SURVIVABILITY ARCHITECTURE , 2001 .

[98]  C. V. Ramamoorthy,et al.  Monitoring and control of distributed systems , 1990, Systems Integration '90. Proceedings of the First International Conference on Systems Integration.

[99]  John D. Musa,et al.  Operational profiles in software-reliability engineering , 1993, IEEE Software.

[100]  Michael K. Reiter,et al.  Dynamic byzantine quorum systems , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[101]  Rogério de Lemos,et al.  Describing Evolving Dependable Systems Using Co-Operative Software Architectures , 2001, ICSM.

[102]  Jtirgen K. Miiller Aspect Design with the Building Block Method , 1999 .

[103]  David Powell,et al.  A fault- and intrusion- tolerant file system , 1985 .

[104]  Paul Clements,et al.  Software Architecture in Practice: Addison-Wesley , 1998 .

[105]  Ricky W. Butler,et al.  The SURE approach to reliability analysis , 1992 .

[106]  Matti A. Hiltunen,et al.  Enhancing survivability of security services using redundancy , 2001, 2001 International Conference on Dependable Systems and Networks.

[107]  Antonio Casimiro,et al.  The timely computing base: Timely actions in the presence of uncertain timeliness , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[108]  J-C. Laprie,et al.  DEPENDABLE COMPUTING AND FAULT TOLERANCE : CONCEPTS AND TERMINOLOGY , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[109]  Miguel Correia,et al.  The Design of a COTSReal-Time Distributed Security Kernel , 2002, EDCC.

[110]  William M. Thomas,et al.  Issues in the Assurance of Component-Based Software , 2000 .

[111]  Sally Floyd,et al.  Difficulties in simulating the internet , 2001, TNET.

[112]  Drasko M. Sotirovski Towards fault-tolerant software architectures , 2001, Proceedings Working IEEE/IFIP Conference on Software Architecture.

[113]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[114]  Johan Lilius,et al.  vUML: a tool for verifying UML models , 1999, 14th IEEE International Conference on Automated Software Engineering.

[115]  J. F. Kitchin Practical Markov modeling for reliability analysis , 1988, 1988. Proceedings., Annual Reliability and Maintainability Symposium,.

[116]  Hermann Kopetz,et al.  Fault tolerance, principles and practice , 1990 .

[117]  Diego Latella,et al.  Dependability analysis in the early phases of UML-based system design , 2001, Comput. Syst. Sci. Eng..

[118]  Cecília M. F. Rubira,et al.  Fault tolerance in concurrent object-oriented software through coordinated error recovery , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[119]  Miguel Correia,et al.  Service and Protocol Architecture for the MAFTIA Middleware , 2001 .

[120]  Morris Sloman,et al.  Distributed systems and computer networks , 1987, Prentice Hall International series in computer science.

[121]  David Garlan,et al.  Architectural Mismatch: Why Reuse Is So Hard , 1995, IEEE Softw..

[122]  Valérie Issarny,et al.  Systematic aid for developing middleware architectures , 2002, CACM.

[123]  Y. Liao,et al.  A Specificational Approach to High Level Program Monitoring and Measuring , 1992, IEEE Trans. Software Eng..

[124]  Paola Inverardi,et al.  Uncovering Architectural Mismatch in Component Behavior , 1999, Sci. Comput. Program..

[125]  D. Powell,et al.  The Delta-4 Approach to Dependability in Open Distributed Computing Systems , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[126]  Gail E. Kaiser,et al.  An Active Events Model for Systems Monitoring , 2001 .

[127]  Brian Randell,et al.  Fundamental Concepts of Dependability , 2000 .

[128]  Andrea Bondavalli,et al.  Automated dependability analysis of UML designs , 1999, Proceedings 2nd IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC'99) (Cat. No.99-61702).

[129]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[130]  Philippe Kruchten,et al.  The 4+1 View Model of Architecture , 1995, IEEE Softw..

[131]  Hervé Debar,et al.  Aggregation and Correlation of Intrusion-Detection Alerts , 2001, Recent Advances in Intrusion Detection.

[132]  Louise E. Moser,et al.  The SecureRing group communication system , 2001, TSEC.

[133]  Andrew A. Chien,et al.  Breaking the barriers: high performance security for high performance computing , 2002, NSPW '02.

[134]  Mario Dal Cin,et al.  Quantitative Analysis of UML Statechart Models of Dependable Systems , 2002, Comput. J..

[135]  Paulo Veríssimo,et al.  Distributed Systems for System Architects , 2001, Advances in Distributed Computing and Middleware.

[136]  Wolfgang Emmerich,et al.  Deadlock detection in distributed object systems , 2001, FSE 2001.

[137]  Jonathan E. Cook,et al.  Highly reliable upgrading of components , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[138]  Dimitra Giannakopoulou,et al.  Behaviour Analysis of Software Architectures , 1999, WICSA.

[139]  Alexander L. Wolf,et al.  Acm Sigsoft Software Engineering Notes Vol 17 No 4 Foundations for the Study of Software Architecture , 2022 .

[140]  István Majzik,et al.  Modeling and analysis of redundancy management in distributed object-oriented systems by using UML statecharts , 2001, Proceedings 27th EUROMICRO Conference. 2001: A Net Odyssey.

[141]  Robert Balzer,et al.  Document integrity through mediated interfaces , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[142]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[143]  Peter C. Bates,et al.  Debugging heterogeneous distributed systems using event-based models of behavior , 1988, PADD '88.

[144]  Gábor Huszerl,et al.  Object Oriented Notation for Modeling Quantitative Aspects , 2002 .

[145]  Michael K. Reiter,et al.  The Rampart Toolkit for Building High-Integrity Services , 1994, Dagstuhl Seminar on Distributed Systems.

[146]  Mary Shaw,et al.  A field guide to boxology: preliminary classification of architectural styles for software systems , 1997, Proceedings Twenty-First Annual International Computer Software and Applications Conference (COMPSAC'97).

[147]  Peter A. Buhr,et al.  Exception Handling , 2002, Advances in Computing.

[148]  Sam Toueg,et al.  A Modular Approach to Fault-Tolerant Broadcasts and Related Problems , 1994 .

[149]  Jean-Claude Laprie,et al.  Software reliability and system reliability , 1996 .

[150]  John E. Dobson,et al.  Building Reliable Secure Computing Systems Out Of Unreliable Insecure Components , 1986, 1986 IEEE Symposium on Security and Privacy.

[151]  Cecília M. F. Rubira,et al.  On applying coordinated atomic actions and dependable software architectures for developing complex systems , 2001, Fourth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing. ISORC 2001.

[152]  Richard N. Taylor,et al.  An infrastructure for the rapid development of XML-based architecture description languages , 2002, ICSE '02.

[153]  Jari Koistinen,et al.  Quality of services specification in distributed object systems design , 1998 .

[154]  Don Cohen,et al.  Automatic Monitoring of Software Requirements , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[155]  Jean-Claude Laprie,et al.  Dependability of computer systems: concepts, limits, improvements , 1995, Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95.

[156]  Ralph L. Keeney,et al.  Decisions with multiple objectives: preferences and value tradeoffs , 1976 .

[157]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[158]  Rogério de Lemos,et al.  Exception handling in the software lifecycle , 2001, Comput. Syst. Sci. Eng..

[159]  Paul Clements,et al.  Software architecture in practice , 1999, SEI series in software engineering.

[160]  onio Casimiro CesiumSpray : a Precise and Accurate Global Clock Service for Large-scale Systems , 1997 .

[161]  Bruno Dutertre,et al.  Intrusion-tolerant Enclaves , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[162]  Philip Koopman,et al.  A Product Family Approach to Graceful Degradation , 2000, DIPES.

[163]  David C. Luckham,et al.  An Event-Based Architecture Definition Language , 1995, IEEE Trans. Software Eng..

[164]  D. G. Weber,et al.  Formal specification of fault-tolerance and its relation to computer security , 1989, IWSSD '89.

[165]  David Garlan,et al.  Formalizing Architectural Connection , 1994, ICSE.

[166]  Gene Tsudik,et al.  New multiparty authentication services and key agreement protocols , 2000, IEEE Journal on Selected Areas in Communications.

[167]  Ran Canetti,et al.  Proactive Security: Long-term protection against break-ins , 1997 .

[168]  Hisashi Kobayashi,et al.  Modeling and analysis , 1978 .

[169]  Kishor S. Trivedi,et al.  Markov and Markov reward model transient analysis: An overview of numerical approaches , 1989 .