How to Make Good Software 4. Fault Tolerance Engineering: from Requirements to Code Software Architecture and Fault Tolerance 5. Verification and Validation of Fault Tolerant Systems

1. Motivations for the Book Building trustworthy systems is one of the main challenges Faced by software developers, who have been concerned with dependability-related issues since the first day system was built and deployed. Obviously, there have been plenty of changes since then, including the nature of faults and failures, the complexity of systems, the services they deliver and the way society uses them. But the need to deal with various threats (such as failed components, deteriorating environments, component mismatches, human mistakes, intrusions and software bugs) is still in the core of software and system research and development. As computers are now spreading into various new domains (including the critical ones) and the complexity of modern systems is growing, achieving dependability remains central for system developers and users. Accepting that errors always happen in spite of all the efforts to eliminate faults that might cause them is in the core of dependability. To this end 2 various fault tolerance mechanisms have been investigated by researchers and used in industry. Unfortunately, more often than not these solutions exclusively focus on the implementation (e.g. they are provided as mid-dleware/OS services or libraries), ignoring other development phases, most importantly the earlier ones. This creates a dangerous gap between the requirement to build dependable (and fault tolerant) systems and the fact that it is not dealt with until the implementation step step.' One consequence of this is that there is a growing number of situations reported in which fault tolerance means undermine the overall system dependability as they are not used properly. We believe that fault tolerance needs to be explicitly included into traditional software engineering theories and practices, and should become an integral part of all steps of software development. As current software engineering practices tend to capture only normal behaviour, assuming that all faults can be removed during development, new software engineering methods and tools need to be developed to support explicit handling of abnormal situations. Moreover, every phase in the software development process needs to be enriched with phase-specific fault tolerance means. Generally speaking, integrating fault tolerance into software engineering requires: 0 integrating fault tolerance means into system models starting from the early development phases (i.e. requirement and architecture); 0 making fault tolerance-related decisions at each phase by explicit modelling of faults, fault tolerance means and dedicated redundant resources (with a specific focus on fault tolerant software architec-tures); ensuring correct transformations …

[1]  Ian Sutherland,et al.  Model Checking and Fault Tolerance , 1997, AMAST.

[3]  Richard D. Schlichting,et al.  Supporting Fault-Tolerant Parallel Programming in Linda , 1995, IEEE Trans. Parallel Distributed Syst..

[4]  Domenico Cotroneo,et al.  Effective fault treatment for improving the dependability of COTS and legacy-based applications , 2004, IEEE Transactions on Dependable and Secure Computing.

[5]  Jörg Kienzle,et al.  Exception-Aware Requirements Elicitation with Use Cases , 2006, Advanced Topics in Exception Handling Techniques.

[6]  Barbara Liskov,et al.  The Argus Language and System , 1984, Advanced Course: Distributed Systems.

[7]  Arie van Deursen,et al.  Discovering faults in idiom-based exception handling , 2006, ICSE '06.

[8]  Santosh K. Shrivastava,et al.  The Design and Implementation of Arjuna , 1995, Comput. Syst..

[9]  Ivar Jacobson,et al.  The Unified Software Development Process , 1999 .

[10]  Daniel Jackson,et al.  Dependable Software by Design , 2006 .

[11]  Alfons Geser,et al.  Abstractions for Fault-Tolerant Distributed System Verification , 2004, TPHOLs.

[12]  Antonio Bucchiarone,et al.  Architecting Fault-tolerant Component-based Systems: from requirements to testing , 2007, Electron. Notes Theor. Comput. Sci..

[13]  Ruzanna Chitchyan,et al.  Persistence as an aspect , 2003, AOSD '03.

[14]  Cecília M. F. Rubira,et al.  A Method for Modeling and Testing Exceptions in Component-Based Software Development , 2005, LADC.

[15]  Cecília M. F. Rubira,et al.  Exception handling in the development of dependable component‐based systems , 2005, Softw. Pract. Exp..

[16]  David Garlan,et al.  Formal Modeling and Analysis of Software Architecture: Components, Connectors, and Events , 2003, SFM.

[17]  Patrick Rogers,et al.  Software fault tolerance, reflection and the Ada programming language , 2003 .

[18]  Cecília M. F. Rubira,et al.  On applying coordinated atomic actions and dependable software architectures for developing complex systems , 2001, Fourth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing. ISORC 2001.

[19]  Johan Fabry,et al.  Aspect-Oriented Domain Specific Languages for Advanced Transaction Management , 2005, ICEIS.

[20]  Rogério de Lemos,et al.  Exception handling in the software lifecycle , 2001, Comput. Syst. Sci. Eng..

[21]  Tatsuhiro Tsuchiya,et al.  Automatic verification of fault tolerance using model checking , 2001, Proceedings 2001 Pacific Rim International Symposium on Dependable Computing.

[22]  Nicolas Guelfi,et al.  A metadata-based architectural model for dynamically resilient systems , 2007, SAC '07.

[23]  Brian Randell,et al.  Object-Oriented Software Fault Tolerance: Framework, reuse and design diversity , 1993 .

[24]  Jörg Kienzle,et al.  AOP: Does It Make Sense? The Case of Concurrency and Failures , 2002, ECOOP.

[25]  Cliff B. Jones,et al.  Structure for dependability - computer-based systems from an interdisciplinary perspective , 2005 .

[26]  Cecília M. F. Rubira,et al.  Fault tolerance in concurrent object-oriented software through coordinated error recovery , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[27]  Hermann Kopetz,et al.  Fault tolerance, principles and practice , 1990 .

[28]  Alexander Romanovsky A looming fault tolerance software crisis? , 2007, SOEN.

[29]  Jörg Kienzle,et al.  Model-Driven assessment of use cases for dependable systems , 2006, MoDELS'06.

[30]  Brian Randell,et al.  Coordinated Atomic Actions: from Concept to Implementation , 1997 .

[31]  Mary Shaw,et al.  A field guide to boxology: preliminary classification of architectural styles for software systems , 1997, Proceedings Twenty-First Annual International Computer Software and Applications Conference (COMPSAC'97).

[32]  Richard N. Taylor,et al.  A Classification and Comparison Framework for Software Architecture Description Languages , 2000, IEEE Trans. Software Eng..

[33]  S. Kulkarni,et al.  Towards Reusing Formal Proofs for Verification of Fault-Tolerance 1 , .

[34]  Wolfgang Emmerich,et al.  Engineering Distributed Objects , 2000, Lecture Notes in Computer Science.

[35]  F. C. Filho,et al.  A framework for analyzing exception flow in software architectures , 2005, WADS@ICSE.

[36]  Andrea Bondavalli,et al.  Automatic dependability analysis for supporting design decisions in UML , 1999, Proceedings 4th IEEE International Symposium on High-Assurance Systems Engineering.

[37]  Joanne Bechta Dugan,et al.  Automatic synthesis of dynamic fault trees from UML system models , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[38]  Anthony S. Wojcik,et al.  Formal Verification of Fault Tolerance Using Theorem-Proving Techniques , 1989, IEEE Trans. Computers.

[39]  Daniel Jackson,et al.  Alloy: a lightweight object modelling notation , 2002, TSEM.

[40]  Charles Fishman,et al.  They write the right stuff , 1996 .

[41]  Gabriele Lenzini,et al.  Logical Specification and Analysis of Fault Tolerant Systems Through Partial Model Checking , 2005, SVV@ICLP.

[42]  Yali Zhu,et al.  Exception handling in component composition with the support of middleware , 2005, SEM '05.

[43]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[44]  Michael R. Lyu,et al.  An empirical study on testing and fault tolerance for software reliability engineering , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[45]  Nenad Medvidovic,et al.  Modeling software architectures in the Unified Modeling Language , 2002, TSEM.

[46]  Recommended Practice for Architectural Description of Software-Intensive Systems , 1999 .

[47]  Alfred Z. Spector,et al.  Camelot: a flexible, distributed transaction processing system , 1988, Digest of Papers. COMPCON Spring 88 Thirty-Third IEEE Computer Society International Conference.

[48]  Avelino Francisco Zorzo,et al.  CAA-DRIP: a framework for implementing Coordinated Atomic Actions , 2006, 2006 17th International Symposium on Software Reliability Engineering.

[49]  Valérie Issarny,et al.  Architecture-based exception handling , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[50]  Bashar Nuseibeh,et al.  Requirements engineering: a roadmap , 2000, ICSE '00.

[51]  Jörg Kienzle,et al.  Exceptional use cases , 2005, MoDELS'05.

[52]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[53]  Diego Latella,et al.  Dependability analysis in the early phases of UML-based system design , 2001, Comput. Syst. Sci. Eng..

[54]  David Gelernter,et al.  Generative communication in Linda , 1985, TOPL.

[55]  Laura L. Pullum,et al.  Software Fault Tolerance Techniques and Implementation , 2001 .

[56]  Alexander L. Wolf,et al.  Acm Sigsoft Software Engineering Notes Vol 17 No 4 Foundations for the Study of Software Architecture , 2022 .

[57]  Avelino Francisco Zorzo Multiparty interactions in dependable distributed systems , 1999 .

[58]  Andrea Bondavalli,et al.  Stochastic Dependability Analysis of System Architecture Based on UML Models , 2002, WADS.

[59]  Avelino Francisco Zorzo,et al.  A distributed object-oriented framework for dependable multiparty interactions , 1999, OOPSLA '99.

[60]  Ian Sommerville,et al.  Software Engineering: (Update) (8th Edition) (International Computer Science) , 2006 .

[61]  Carlo Ghezzi,et al.  Fundamentals of Software Engineering , 2011, Lecture Notes in Computer Science.

[62]  Sérgio Soares,et al.  Implementing distribution and persistence aspects with aspectJ , 2002, OOPSLA '02.

[63]  Jeannette M. Wing,et al.  Inheritance of synchronization and recovery properties in Avalon/C++ , 1988 .

[64]  Sushil Jajodia,et al.  A fault tolerance approach to survivability , 1998, Proceedings Computer Security, Dependability, and Assurance: From Needs to Solutions (Cat. No.98EX358).

[65]  Cecília M. F. Rubira,et al.  Verification of coordinated exception handling , 2006, SAC '06.

[66]  Budi Arief,et al.  On using the CAMA framework for developing open mobile fault tolerant agent systems , 2006, SELMAS '06.

[67]  Rogério de Lemos Idealised Fault Tolerant Architectural Element , 2006 .

[68]  Nissim Francez,et al.  Multiparty Interactions for Interprocess Communication and Synchronization , 1989, IEEE Trans. Software Eng..

[69]  Jörg Kienzle,et al.  Open Multithreaded Transactions: A Transaction Model for Concurrent Object-Oriented Programming , 2003 .

[70]  Cecília M. F. Rubira,et al.  An Architectural-Level Exception-Handling System for Component-Based Applications , 2003, LADC.

[71]  Natarajan Shankar,et al.  Formal Verification for Fault-Tolerant Architectures: Prolegomena to the Design of PVS , 1995, IEEE Trans. Software Eng..

[72]  Antonia Bertolino,et al.  Software Testing Research and Practice , 2003, Abstract State Machines.

[73]  Henrik Reif Andersen,et al.  Partial model checking , 1995, Proceedings of Tenth Annual IEEE Symposium on Logic in Computer Science.

[74]  Stefania Gnesi,et al.  Model checking fault tolerant systems , 2002, Softw. Test. Verification Reliab..

[75]  Henry Muccini,et al.  Architecting Fault Tolerant Systems , 2007, 2007 Working IEEE/IFIP Conference on Software Architecture (WICSA'07).