Software Health Management: A Short Review of Challenges and Existing Techniques

Modern spacecraft (as well as most other complex mechanisms like aircraft, automobiles, and chemical plants) rely more and more on software, to a point where software failures have caused severe accidents and loss of missions. Software failures during a manned mission can cause loss of life, so there are severe requirements to make the software as safe and reliable as possible. Typically, verification and validation (V&V) has the task of making sure that all software errors are found before the software is deployed and that it always conforms to the requirements. Experience, however, shows that this gold standard of error-free software cannot be reached in practice. Even if the software alone is free of glitches, its interoperation with the hardware (e.g., with sensors or actuators) can cause problems. Unexpected operational conditions or changes in the environment may ultimately cause a software system to fail. Is there a way to surmount this problem? In most modern aircraft and many automobiles, hardware such as central electrical, mechanical, and hydraulic components are monitored by IVHM (Integrated Vehicle Health Management) systems. These systems can recognize, isolate, and identify faults and failures, both those that already occurred as well as imminent ones. With the help of diagnostics and prognostics, appropriate mitigation strategies can be selected (replacement or repair, switch to redundant systems, etc.). In this short paper, we discuss some challenges and promising techniques for software health management (SWHM). In particular, we identify unique challenges for preventing software failure in systems which involve both software and hardware components. We then present our classifications of techniques related to SWHM. These classifications are performed based on dimensions of interest to both developers and users of the techniques, and hopefully provide a map for dealing with software faults and failures.

[1]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[2]  Gregor von Bochmann,et al.  Trace Analysis for Conformance and Arbitration Testing , 1989, IEEE Trans. Software Eng..

[3]  Edward V. Berard,et al.  Testing Object-Oriented Software , 2021, TOOLS.

[4]  R. Machuzak,et al.  Model-Based Engineering Design Pilots at JPL , 2007, 2007 IEEE Aerospace Conference.

[5]  Michel D. Ingham,et al.  Goal-Based Operations: An Overview , 2007, J. Aerosp. Comput. Inf. Commun..

[6]  G. M. Bull Dynamic debugging in BASIC , 1972, Comput. J..

[7]  Angelos D. Keromytis,et al.  Characterizing Self-Healing Software Systems , 2007 .

[8]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[9]  William C. Hetzel,et al.  The complete guide to software testing , 1984 .

[10]  Jeff Magee,et al.  Dynamic Configuration for Distributed Systems , 1985, IEEE Transactions on Software Engineering.

[11]  Jacky Estublier Software configuration management: a roadmap , 2000, ICSE '00.

[12]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[13]  Fengyu Liu,et al.  Self-healing based software architecture modeling and analysis through a case study , 2005, Proceedings. 2005 IEEE Networking, Sensing and Control, 2005..

[14]  J-C. Laprie,et al.  DEPENDABLE COMPUTING AND FAULT TOLERANCE : CONCEPTS AND TERMINOLOGY , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[15]  Beth A. Schroeder On-Line Monitoring: A Tutorial , 1995, Computer.

[16]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[17]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[18]  Johann Schumann,et al.  Automated Theorem Proving in Software Engineering , 2001, Springer Berlin Heidelberg.

[19]  Antonia Bertolino,et al.  Software Testing Research: Achievements, Challenges, Dreams , 2007, Future of Software Engineering (FOSE '07).

[20]  David A. Patterson,et al.  Recovery Oriented Computing: A New Research Agenda for a New Century , 2002, HPCA.