论文信息 - Near-Miss Analysis and the Availability of Software Systems

Near-Miss Analysis and the Availability of Software Systems

Software failures often result in unavailability of systems causing disasters ranging from financial loss to loss of lives. Preventing their recurrence is therefore absolutely necessary. To this end, a post-mortem investigation of a software failure is usually conducted to identify its root cause. However, these investigations most often lack efficiency and accuracy, as they are dependent on human expertise and level of knowledge of the system, and are therefore subjective in nature. Furthermore, investigating a software failure can be challenging due to the usually high volume of failure data such as log entries to be scrutinised. To address this problem, near-miss analysis is proposed. Near-miss analysis is an incident investigation technique that detects indicators of a likely failure before the failure unfolds. As these indicators – known as near misses – that are very close to the point of failure, they are most likely to point to its root cause. Near-miss analysis therefore offers an objective method to root-cause analysis based on the data collected from the near misses. The near-miss analysis method proposed in this paper is based on the pattern analysis of a software system’s behaviour close to a failure in order to identify near misses. The viability of the proposed method is demonstrated through an experiment.

Jan H. P. Eloff | Madeleine Adrienne Bihina Bella

[1] Andries P. Engelbrecht,et al. Computational Intelligence: An Introduction , 2002 .

[2] Jan H. P. Eloff,et al. Exploring Forensic Data with Self-Organizing Maps , 2005, IFIP Int. Conf. Digital Forensics.

[3] U. Ritwik. Risk-based approach to near miss , 2002 .

[4] Michael J. Corby. Forensics: Operational , 2011, Encyclopedia of Information Assurance.

[5] Shari Lawrence Pfleeger,et al. Security in Computing, 4th Edition , 2006 .

[6] Jan H. P. Eloff,et al. Proposing a Digital Operational Forensic Investigation Process , 2011, WDFIA.

[7] J. W. Cletcher,et al. Precursors to potential severe core damage accidents, 1990: A status report , 1990 .

[8] Vicki M. Bier,et al. Accident Precursor Analysis and Management: Reducing Technological Risk Through Diligence , 2004 .

[9] Les Hatton. Forensic Software Engineering: an overview , 2004 .

[10] Simon Jones,et al. The importance of near miss reporting to further improve safety performance , 1999 .

[11] Nancy G. Leveson,et al. A systems approach to risk management through leading safety indicators , 2015, Reliab. Eng. Syst. Saf..