Software contributions to aircraft adverse events: Case studies and analyses of recurrent accident patterns and failure mechanisms

Abstract Software is central to aircraft flight operation, and by the same token it is playing an increasing role in aircraft incidents and accidents. Software related errors have distinctive failure mechanisms, and their contributions to aircraft accident sequences are not properly understood or captured by traditional risk analysis techniques. To better understand these mechanisms, we analyze in this work five recent aircraft accidents and incidents involving software. For each case, we identify the role of software and analyze its contributions to the sequence of events leading to the accident. We adopt a visualization tool based on the Sequential Timed Event Plotting (STEP) methodology to highlight the software's interaction with sensors and other aircraft subsystems, and its contributions to the incident/accident. The case studies enable an in-depth analysis of recurrent failure mechanisms and provide insight into the causal chain and patterns through which software contributes to adverse events. For example, the case studies illustrate how software related failures can be context- or situation-dependent, situations that may have been overlooked during software verification and validation or testing. The case studies also identify the critical role of flawed sensor inputs as a key determinant or trigger of “dormant” software defects. In some cases, we find that software features put in place to address certain risks under nominal operating conditions are the ones that lead or contribute to accidents under off-nominal or unconsidered conditions. The case studies also demonstrate that the software may be complying with its requirements but still place the aircraft in a hazardous state or contribute to an adverse event. This result challenges the traditional notion, articulated in most standards, of software failure as non-compliance with requirements, and it invites a careful re-thinking of this and related concepts. We provide a careful review of these terms (software error, fault, failure), propose a synthesis of recurrent patterns of software contributions to adverse events and their triggering mechanisms, and conclude with some preliminary recommendations for tackling them.

[1]  Joseph H. Saleh,et al.  From learning from accidents to teaching about accident causation and prevention: Multidisciplinary education and safety literacy for all engineering students , 2012, Reliab. Eng. Syst. Saf..

[2]  Nancy G. Leveson,et al.  Role of Software in Spacecraft Accidents , 2004 .

[3]  George E Apostolakis,et al.  How Useful Is Quantitative Risk Assessment? , 2004, Risk analysis : an official publication of the Society for Risk Analysis.

[4]  Ann Q. Gates,et al.  A taxonomy and catalog of runtime software-fault monitoring tools , 2004, IEEE Transactions on Software Engineering.

[5]  Chris W. Johnson,et al.  The Dangers of Failure Masking in Fault-Tolerant Software: Aspects of a Recent In-Flight Upset Event , 2007 .

[6]  Jeffrey C. Carver,et al.  A systematic literature review to identify and classify software requirement errors , 2009, Inf. Softw. Technol..

[7]  Nancy G. Leveson,et al.  Safeware: System Safety and Computers , 1995 .

[8]  I A Papazoglou,et al.  Master Logic Diagram: method for hazard and initiating event identification in process plants. , 2003, Journal of hazardous materials.

[9]  John C. Munson,et al.  Software faults: A quantifiable definition , 2006, Adv. Eng. Softw..

[10]  Carl E. Landwehr,et al.  A taxonomy of computer program security flaws , 1993, CSUR.

[11]  Ram Chillarege,et al.  Orthogonal defect classification , 1996 .

[12]  Raghvendra V. Cowlagi,et al.  Coordinability and Consistency in Accident Causation and Prevention: Formal System Theoretic Concepts for Safety in Multilevel Systems , 2013, Risk analysis : an official publication of the Society for Risk Analysis.

[13]  I. A. Herrera,et al.  Comparing a multi-linear (STEP) and systemic (FRAM) method for accident analysis , 2010, Reliab. Eng. Syst. Saf..

[14]  K. Oguchi,et al.  Towards Model-Based Failure-Management for Automotive Software , 2007, Fourth International Workshop on Software Engineering for Automotive Systems (SEAS '07).

[15]  Robyn R. Lutz,et al.  Analyzing software requirements errors in safety-critical, embedded systems , 1993, [1993] Proceedings of the IEEE International Symposium on Requirements Engineering.

[16]  David Carlisle CONTROLLED FLIGHT INTO TERRAIN , 2001 .

[17]  Jane Huffman Hayes Building a requirement fault taxonomy: experiences from a NASA verification and validation research project , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[18]  Kingsley Hendrick,et al.  Investigating Accidents with Step , 1986 .

[19]  Yu-Shu Hu Evaluating system behavior through dynamic master logic diagram modeling , 1995 .

[20]  Efstathios Bakolas,et al.  Highlights from the literature on accident causation and system safety: Review of major ideas, recent contributions, and challenges , 2010, Reliab. Eng. Syst. Saf..

[21]  Chris W. Johnson,et al.  PII: S0951-8320(99)00066-6 , 1999 .

[22]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.