Simultaneous Fault Models for the Generation of Efficient Error Detection Mechanisms

The application of machine learning to software fault injection data has been shown to be an effective approach for the generation of efficient error detection mechanisms (EDMs). However, such approaches to the design of EDMs have invariably adopted a fault model with a single-fault assumption, limiting the practical relevance of the detectors and their evaluation. Software containing more than a single fault is commonplace, with prominent safety standards recognising that critical failures are often the result of unlikely or unforeseen combinations of faults. This paper addresses this shortcoming, demonstrating that it is possible to generate similarly efficient EDMs under more realistic fault models. In particular, it is shown that (i) efficient EDMs can be designed using fault data collected under models accounting for the occurrence of simultaneous faults, (ii) exhaustive fault injection under a simultaneous bit flip model can yield improvements to EDM efficiency, and (iii) exhaustive fault injection under a simultaneous bit flip model can made non-exhaustive, reducing the resource costs of experimentation to practicable levels, without sacrificing resultant EDM efficiency.

[1]  Akbar Siami Namin,et al.  Sufficient mutation operators for measuring test effectiveness , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[2]  David A. Cieslak,et al.  Automatically countering imbalance and its empirical relationship to cost , 2008, Data Mining and Knowledge Discovery.

[3]  Neeraj Suri,et al.  PROPANE: an environment for examining the propagation of errors in software , 2002, ISSTA '02.

[4]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[5]  Martin Hiller,et al.  Executable assertions for detecting data errors in embedded control systems , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[6]  Neeraj Suri,et al.  No PAIN, No Gain? The Utility of PArallel Fault INjections , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[7]  Francisco Vilar Brasileiro,et al.  GridUnit: software testing on the grid , 2006, ICSE.

[8]  Arshad Jhumka,et al.  A methodology for the generation of efficient error detection mechanisms , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[9]  Neeraj Suri,et al.  On the Selection of Error Model(s) for OS Robustness Evaluation , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[10]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  David Powell,et al.  Failure mode assumptions and assumption coverage , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[13]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[14]  Neeraj Suri,et al.  An approach for designing and assessing detectors for dependable component-based systems , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[15]  Neeraj Suri,et al.  An approach to synthesise safe systems , 2006, Int. J. Secur. Networks.

[16]  Neeraj Suri,et al.  An empirical study of injected versus actual interface errors , 2014, ISSTA 2014.

[17]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[18]  Arshad Jhumka,et al.  Evaluating the Use of Reference Run Models in Fault Injection Analysis , 2009, 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing.

[19]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[20]  Neeraj Suri,et al.  On the placement of software mechanisms for detection of data errors , 2002, Proceedings International Conference on Dependable Systems and Networks.

[21]  Nancy G. Leveson,et al.  The Use of Self Checks and Voting in Software Error Detection: An Empirical Study , 1990, IEEE Trans. Software Eng..

[22]  Arshad Jhumka,et al.  Issues on the Design of Efficient Fail-Safe Fault Tolerance , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[23]  Henrique Madeira,et al.  Emulation of Software Faults: A Field Data Study and a Practical Approach , 2006, IEEE Transactions on Software Engineering.

[24]  Neeraj Suri,et al.  simFI: From single to simultaneous software fault injections , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[25]  Domenico Cotroneo,et al.  On Fault Representativeness of Software Fault Injection , 2013, IEEE Transactions on Software Engineering.

[26]  Alexey L. Lastovetsky Parallel testing of distributed software , 2005, Inf. Softw. Technol..

[27]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[28]  George Candea,et al.  Automatic failure-path inference: a generic introspection technique for Internet applications , 2003, Proceedings the Third IEEE Workshop on Internet Applications. WIAPP 2003.

[29]  Hagen Völzer Verifying Fault Tolerance of Distributed Algorithms Formally - An Example , 1998, ACSD.

[30]  Arshad Jhumka,et al.  Towards the Design of Efficient Error Detection Mechanisms for Transient Data Errors , 2011, Comput. J..

[31]  Karthik Pattabiraman,et al.  Error detector placement for soft computation , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[32]  Jean Arlat,et al.  Estimators for fault tolerance coverage evaluation , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[33]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[34]  Jean Arlat,et al.  Estimators for Fault Tolerance Coverage Evaluation , 1995, IEEE Trans. Computers.

[35]  Peter M. Chen,et al.  The Design and Verification of the Rio File Cache , 2001, IEEE Trans. Computers.

[36]  Ali Ebnenasir,et al.  The complexity of adding failsafe fault-tolerance , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[37]  Anish Arora,et al.  Detectors and correctors: a theory of fault-tolerance components , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[38]  Kai Ming Ting,et al.  An Instance-weighting Method to Induce Cost-sensitive Trees , 2001 .