Architecting Fault Tolerant Systems

While typical solutions focus on fault tolerance (and specifically, exception handling) during the design and implementation phases of the software life-cycle (e.g., Java and Windows NT exception handling), more recently the need for explicit exception handling solutions during the entire life cycle has been advocated by some researchers. Several solutions have been proposed for fault tolerance via exception handling at the software architecture and component levels. This paper describes how the two concepts of fault tolerance and software architectures have been integrated so far. It is structured in two parts (overview on fault tolerance and exception handling, and integrating fault tolerance into software architecture) and is based on a survey study on architecting fault tolerant systems where more than fifteen approaches have been analyzed and classified. This paper concludes by identifying those issues that remain still open and require deeper investigation.

[1]  D. L. Parnas,et al.  On the criteria to be used in decomposing systems into modules , 1972, Software Pioneers.

[2]  John B. Goodenough,et al.  Exception handling: issues and a proposed notation , 1975, CACM.

[3]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[4]  Butler W. Lampson,et al.  Atomic Transactions , 1980, Advanced Course: Distributed Systems.

[5]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[6]  Brian Randell,et al.  Error recovery in asynchronous systems , 1986, IEEE Transactions on Software Engineering.

[7]  Thomas Anderson Dependability of resilient computers , 1989 .

[8]  Hermann Kopetz,et al.  Fault tolerance, principles and practice , 1990 .

[9]  Jean Arlat,et al.  Definition and analysis of hardware- and software-fault-tolerant architectures , 1990, Computer.

[10]  K. H. Kim,et al.  A Distributed Recovery Block Approach to Fault-Tolerant Execution of Application Tasks in Hypercubes , 1993, IEEE Trans. Parallel Distributed Syst..

[11]  N. Lynch,et al.  Atomic Transactions , 1993, Morgan Kaufmann series in data management systems.

[12]  Cecília M. F. Rubira,et al.  Fault tolerance in concurrent object-oriented software through coordinated error recovery , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[13]  Lui Sha,et al.  An Architectural Description of the Simplex Architecture. , 1996 .

[14]  Lui Sha,et al.  Designing for evolvability: building blocks for evolvable real-time systems , 1996, Proceedings Real-Time Technology and Applications.

[15]  Mary Shaw,et al.  A field guide to boxology: preliminary classification of architectural styles for software systems , 1997, Proceedings Twenty-First Annual International Computer Software and Applications Conference (COMPSAC'97).

[16]  Theme Feature Toward Systematic Design of Fault- Tolerant Systems , 1997 .

[17]  Paola Inverardi,et al.  ROSATEA: International Workshop on the Role of Software Architecture in Analysis E(and) Testing , 1999, SOEN.

[18]  Jonathan E. Cook,et al.  Highly reliable upgrading of components , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[19]  Jeff Magee,et al.  Concurrency - state models and Java programs , 2006 .

[20]  Robert L. Nord,et al.  Applied Software Architecture , 1999, Addison Wesley object technology series.

[21]  Richard N. Taylor,et al.  A Classification and Comparison Framework for Software Architecture Description Languages , 2000, IEEE Trans. Software Eng..

[22]  Alexander Romanovsky,et al.  Guest Editors' Introduction - Current Trends in Exception Handling , 2000, IEEE Trans. Software Eng..

[23]  Jan Bosch,et al.  Design and use of software architectures - adopting and evolving a product-line approach , 2000 .

[24]  Adam A. Porter,et al.  Empirical studies of software engineering: a roadmap , 2000, ICSE '00.

[25]  Alexander L. Wolf,et al.  Architecture-Level Dependence Analysis for Software Systems , 2001, Int. J. Softw. Eng. Knowl. Eng..

[26]  Bashar Nuseibeh,et al.  Weaving Together Requirements and Architectures , 2001, Computer.

[27]  Alexander Romanovsky Exception handling in component-based system development , 2001, 25th Annual International Computer Software and Applications Conference. COMPSAC 2001.

[28]  Rogério de Lemos,et al.  Exception handling in the software lifecycle , 2001, Comput. Syst. Sci. Eng..

[29]  Valérie Issarny,et al.  Architecture-based exception handling , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[30]  Cecília M. F. Rubira,et al.  A comparative study of exception handling mechanisms for building dependable object-oriented software , 2001, J. Syst. Softw..

[31]  Alexander L. Wolf,et al.  Software architecture , 2001 .

[32]  Cecília M. F. Rubira,et al.  On applying coordinated atomic actions and dependable software architectures for developing complex systems , 2001, Fourth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing. ISORC 2001.

[33]  Marija Mikic-Rakic,et al.  Increasing the confidence in off-the-shelf components: a software connector-based approach , 2001, SSR '01.

[34]  Cecília M. F. Rubira,et al.  Explicit representation of exception handling in the development of dependable component-based systems , 2001, Proceedings Sixth IEEE International Symposium on High Assurance Systems Engineering. Special Topic: Impact of Networking.

[35]  Jørgen Lindskov Knudsen,et al.  Advances in Exception Handling Techniques , 2001, Lecture Notes in Computer Science.

[36]  Lui Sha,et al.  Using Simplicity to Control Complexity , 2001, IEEE Softw..

[37]  Rogério de Lemos,et al.  Describing Evolving Dependable Systems Using Co-Operative Software Architectures , 2001, ICSM.

[38]  Rogério de Lemos,et al.  Tolerating Architectural Mismatches , 2002 .

[39]  Nenad Medvidovic,et al.  Modeling software architectures in the Unified Modeling Language , 2002, TSEM.

[40]  Chang Liu,et al.  RAIC: Architecting Dependable Systems through Redundancy and Just-In-Time Testing , 2002 .

[41]  Cecilia Mary Fischer Rubira,et al.  An Idealized Fault-Tolerant Architectural Component , 2002 .

[42]  Jean-Claude Laprie,et al.  A Framework for Dependability Engineering of Critical Computing Systems , 2002 .

[43]  Nazareno Aguirre,et al.  Some Institutional Requirements for Temporal Reasoning on Dynamic Reconfiguration of Component Based Systems , 2003, Verification: Theory and Practice.

[44]  Fernando Castor Filho,et al.  FaTC2: An Object-Oriented Framework for Developing Fault-Tolerant Component-Based Systems , 2003 .

[45]  Paola Inverardi,et al.  Formal Methods for Software Architectures , 2003 .

[46]  David Garlan,et al.  Formal Modeling and Analysis of Software Architecture: Components, Connectors, and Events , 2003, SFM.

[47]  David Garlan,et al.  Documenting software architectures: views and beyond , 2002, 25th International Conference on Software Engineering, 2003. Proceedings..

[48]  Cecília M. F. Rubira,et al.  A fault-tolerant software architecture for COTS-based software systems , 2003, ESEC/FSE-11.

[49]  Cecília M. F. Rubira,et al.  A Dependable Architecture for COTS-Based Software Systems Using Protective Wrappers , 2003, WADS.

[50]  Paola Inverardi,et al.  A Framework for Reconfiguration-Based Fault-Tolerance in Distributed Systems , 2003, WADS.

[51]  C. Murray Woodside,et al.  Dependability Modeling of Self-healing Client-Server Applications , 2003, WADS.

[52]  Cecília M. F. Rubira,et al.  An Architectural-Level Exception-Handling System for Component-Based Applications , 2003, LADC.

[53]  Elena Troubitsyna,et al.  Fault tolerance in a layered architecture: a general specification pattern in B , 2004, Proceedings of the Second International Conference on Software Engineering and Formal Methods, 2004. SEFM 2004..

[54]  Armin B. Cremers,et al.  Strategies for a Component-Based Self-adaptability Model in Peer-to-Peer Architectures , 2004, CBSE.

[55]  Domenico Cotroneo,et al.  Effective fault treatment for improving the dependability of COTS and legacy-based applications , 2004, IEEE Transactions on Dependable and Secure Computing.

[56]  Ivica Crnkovic,et al.  Real world influences on software architecture - interviews with industrial system experts , 2004, Proceedings. Fourth Working IEEE/IFIP Conference on Software Architecture (WICSA 2004).

[57]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[58]  Rogério de Lemos,et al.  An architectural approach for improving availability in Web services , 2004, International Conference on Software Engineering.

[59]  Rogério de Lemos Analysing failure behaviours in component interaction , 2004, J. Syst. Softw..

[60]  Alexander Romanovsky,et al.  Architecting Dependable Systems II , 2004, Lecture Notes in Computer Science.

[61]  Yali Zhu,et al.  Exception handling in component composition with the support of middleware , 2005, SEM '05.

[62]  Cecília M. F. Rubira,et al.  Exception handling in the development of dependable component‐based systems , 2005, Softw. Pract. Exp..

[63]  P. Narasimhan,et al.  Architectural support for mode-driven fault tolerance in distributed applications , 2005, WADS@ICSE.

[64]  Cecília M. F. Rubira,et al.  A Method for Modeling and Testing Exceptions in Component-Based Software Development , 2005, LADC.

[65]  René L. Krikhaar,et al.  Architectural support in industry: a reflection using C-POSH , 2005, J. Softw. Maintenance Res. Pract..

[66]  F. C. Filho,et al.  A framework for analyzing exception flow in software architectures , 2005, WADS@ICSE.

[67]  Cliff B. Jones,et al.  Structure for dependability - computer-based systems from an interdisciplinary perspective , 2005 .

[68]  Fernando Castor Filho,et al.  Specification of exception flow in software architectures , 2006, J. Syst. Softw..

[69]  Robert L. Nord,et al.  Proceedings of the Working IEEE/IFIP Conference on Software Architecture , 2006 .

[70]  Jeff Magee,et al.  Concurrency - state models and Java programs (2. ed.) , 2006 .

[71]  Luciano Baresi,et al.  Style-based modeling and refinement of service-oriented architectures , 2006, Software & Systems Modeling.

[72]  Rogério de Lemos,et al.  Architecting dependable systems , 2003, J. Syst. Softw..

[73]  Martin Wirsing,et al.  A Component Model for Architectural Programming , 2006, FACS.

[74]  A. Coker,et al.  Multijunction Fault-Tolerance Architecture for Nanoscale Crossbar Memories , 2006, IEEE Transactions on Nanotechnology.

[75]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[76]  Rogério de Lemos,et al.  Architectural reconfiguration using coordinated atomic actions , 2006, SEAMS '06.

[77]  Rogério de Lemos Idealised Fault Tolerant Architectural Element , 2006 .

[78]  T. S. E. Maibaum,et al.  Towards specification, modelling and analysis of fault tolerance in self managed systems , 2006, SEAMS '06.

[79]  Cecília M. F. Rubira,et al.  A fault-tolerant architectural approach for dependable systems , 2006, IEEE Software.

[80]  Avelino Francisco Zorzo,et al.  CAA-DRIP: a framework for implementing Coordinated Atomic Actions , 2006, 2006 17th International Symposium on Software Reliability Engineering.

[81]  Unchecked Exceptions : Can the Programmer be Trusted to Document Exceptions ? , 2006 .

[82]  Arie van Deursen,et al.  Discovering faults in idiom-based exception handling , 2006, ICSE '06.

[83]  Nenad Medvidovic Moving Architectural Description from Under the Technology Lamppost , 2006, EUROMICRO-SEAA.

[84]  Cecília M. F. Rubira,et al.  Reasoning About Exception Flow at the Architectural Level , 2006, RODIN Book.

[85]  Alexander Romanovsky A looming fault tolerance software crisis? , 2007, SOEN.

[86]  Antonio Bucchiarone,et al.  Architecting Fault-tolerant Component-based Systems: from requirements to testing , 2007, Electron. Notes Theor. Comput. Sci..

[87]  Paulo Marques,et al.  Exception Handling: A Field Study in Java and .NET , 2007, ECOOP.

[88]  Yuriy Brun,et al.  Self-assembly for discreet, fault-tolerant, and scalable computation on internet-sized distributed networks , 2008 .

[89]  David Garlan,et al.  Software architecture (panel): next steps towards an engineering discipline for software systems design , 1995, SIGSOFT FSE.

[90]  Remco C. de Boer,et al.  Knowledge Management in Software Architecture: State of the Art , 2009, Software Architecture Knowledge Management.

[91]  Henry Muccini,et al.  How to Make Good Software 4. Fault Tolerance Engineering: from Requirements to Code Software Architecture and Fault Tolerance 5. Verification and Validation of Fault Tolerant Systems , .