Software health management: a necessity for safety critical systems

As software and software intensive systems are becoming increasingly ubiquitous, the impact of failures can be tremendous. In some industries such as aerospace, medical devices, or automotive, such failures can cost lives or endanger mission success. Software faults can arise due to the interaction between the software, the hardware, and the operating environment. Unanticipated environmental changes lead to software anomalies that may have significant impact on the overall success of the mission. Latent coding errors can at any time during system operation trigger faults despite the fact that usually a significant effort has been expended in verification and validation (V&V) of the software system. Nevertheless, it is becoming increasingly more apparent that pre-deployment V&V is not enough to guarantee that a complex software system meets all safety, security, and reliability requirements. Software Health Management (SWHM) is a new field that is concerned with the development of tools and technologies to enable automated detection, diagnosis, prediction, and mitigation of adverse events due to software anomalies, while the system is in operation. The prognostic capability of the SWHM to detect and diagnose failures before they happen will yield safer and more dependable systems for the future. This paper addresses the motivation, needs, and requirements of software health management as a new discipline and motivates the need for SWHM in safety critical applications.

[1]  Peter Neumann,et al.  Safeware: System Safety and Computers , 1995, SOEN.

[2]  John Richardson Stuxnet as Cyberwarfare: Applying the Law of War to the Virtual Battlefield, 29 J. Marshall J. Computer & Info. L. 1 (2011) , 2011 .

[3]  Insup Lee,et al.  A Safety-Assured Development Approach for Real-Time Software , 2010, 2010 IEEE 16th International Conference on Embedded and Real-Time Computing Systems and Applications.

[4]  Lee Pike,et al.  Runtime Verification for Ultra-Critical Systems , 2011, RV.

[5]  Thomas Ledoux,et al.  Aspect-Oriented Software Development , 2003 .

[6]  J. Choi,et al.  Proceedings of ICALEPCS 2003 : the 9th International Conference on Accelerator and Large Experimental Physics Control Systems , 2005 .

[7]  Alwyn E. Goodloe,et al.  Toward Monitoring Fault-Tolerant Embedded Systems (Extended Abstract) , 2009 .

[8]  Gabor Karsai,et al.  A Real-Time Component Framework: Experience with CCM and ARINC-653 , 2010, 2010 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing.

[9]  Ashok N. Srivastava,et al.  Detection and Prognostics on Low-Dimensional Systems , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Luigi Portinale,et al.  EVALUATION OF ANOMALY AND FAILURE SCENARIOS INVOLVING AN EXPLORATION ROVER : A BAYESIAN NETWORK APPROACH , 2012 .

[11]  Hoyt Lougee,et al.  SOFTWARE CONSIDERATIONS IN AIRBORNE SYSTEMS AND EQUIPMENT CERTIFICATION , 2001 .

[12]  Nancy G. Leveson,et al.  An investigation of the Therac-25 accidents , 1993, Computer.

[13]  Byron A. Ellis,et al.  Condition Based Maintenance , 2008 .

[14]  John D. Schierman,et al.  Run-Time Verification and Validation for Safety-Critical Flight Control Systems , 2008 .

[15]  S. Narasimhan,et al.  Automated Diagnosis of Physical Systems , 2007 .

[16]  M Pizka,et al.  Establishing Economic Effectiveness through Software Health-Management , 2009 .

[17]  S. Narasimhan,et al.  HyDE – A General Framework for Stochastic and Hybrid Model-based Diagnosis , 2007 .

[18]  Matthew Barry,et al.  Goal-Based Flight Software Health Management Services , 2009 .

[19]  B. Boehm Software risk management: principles and practices , 1991, IEEE Software.

[20]  Scott Hamilton,et al.  NASA's mission reliable , 2004, Computer.

[21]  R. Keith Mobley Condition based maintenance , 1998 .

[22]  David L. Iverson Inductive System Health Monitoring , 2004, IC-AI.

[23]  E. A. Simulation-Based Verification of Autonomous Controllers via Livingstone PathFinder , 2004 .

[24]  Alan Bundy,et al.  Constructing Induction Rules for Deductive Synthesis Proofs , 2006, CLASE.

[25]  A. Prasad Sistla,et al.  Runtime Monitoring of Stochastic Cyber-Physical Systems with Hybrid State , 2011, RV.

[26]  Sriram Sankaranarayanan,et al.  Combining Time and Frequency Domain Specifications for Periodic Signals , 2011, RV.

[27]  Siau-Cheng Khoo,et al.  NORT: Runtime Anomaly-Based Monitoring of Malicious Behavior for Windows , 2011, RV.

[28]  Asaf Degani,et al.  Taming HAL: Designing Interfaces Beyond 2001 , 2004 .

[29]  Grigore Rosu,et al.  Monitoring Java Programs with Java PathExplorer , 2001, RV@CAV.

[30]  John C. Knight,et al.  What Should Aviation Safety Incidents Teach Us ? , 1999 .

[31]  Tolga Kurtoglu,et al.  Using Auto-Generated Diagnostic Trees for Optimized Fault Handling , .

[32]  Matthew Barry,et al.  Prototype Implementation of a Goal-Based Software Health Management Service , 2009, 2009 Third IEEE International Conference on Space Mission Challenges for Information Technology.

[33]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[34]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[35]  Wei Dong,et al.  Impartial Anticipation in Runtime-Verification , 2008, ATVA.

[36]  Daming Lin,et al.  A review on machinery diagnostics and prognostics implementing condition-based maintenance , 2006 .

[37]  Peter G. Neumann Illustrative risks to the public in the use of computer systems and related technology , 1992, SOEN.

[38]  Robert W. Mah,et al.  In‐Flight Vehicle Health Management , 2010 .