Software Implemented Detection and Recovery of Soft Errors in a Brake-by-Wire System

This paper presents an experimental study of the impact of soft errors in a prototype brake-by-wire system. To emulate the effects of soft errors, we injected single bit-flips into "live" data in the architected state of a MPC565 microcontroller. We first describe the results of an error injection campaign with a brake-by-wire controller in which hardware exceptions are the only means for error detection. In this campaign, 30% of the injected errors passed undetected and caused the controller to produce erroneous outputs to the brake actuator. Of these, 15% resulted in critical failures. An analysis showed that a majority of the critical failures were caused by errors affecting either the stack pointer or the controller's integrator. Hence, we designed two software implemented error handling mechanisms that protect the stack pointer and the integrator state, inducing an overhead of 4% in data and 8% in speed. A second error injection campaign showed that these mechanisms reduced the proportion of critical failures one order of magnitude, from 4.6% to 0.4% of the injected soft errors.

[1]  Klaus Grimm Software technology in an automotive company - major challenges , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[2]  Johan Karlsson,et al.  GOOFI: generic object-oriented fault injection tool , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[3]  J. Tschanz,et al.  Neutron soft error rate measurements in a 90-nm CMOS process and scaling trends in SRAM from 0.25-/spl mu/m to 90-nm generation , 2003, IEEE International Electron Devices Meeting 2003.

[4]  Gerard J. Holzmann,et al.  The Model Checker SPIN , 1997, IEEE Trans. Software Eng..

[5]  Mário Zenha Rela,et al.  A study of failure models in feedback control systems , 2001, 2001 International Conference on Dependable Systems and Networks.

[6]  Henrique Madeira,et al.  Practical issues in the use of ABFT and a new failure model , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[7]  Johan Karlsson,et al.  Reducing critical failures for control algorithms using executable assertions and best effort recovery , 2001, 2001 International Conference on Dependable Systems and Networks.

[8]  Johan Karlsson,et al.  On the design of robust integrators for fail-bounded control systems , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[9]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[10]  Gurindar S. Sohi,et al.  Dynamic dead-instruction detection and elimination , 2002, ASPLOS X.

[11]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[12]  Sanjay J. Patel,et al.  Y-branches: when you come to a fork in the road, take it , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[13]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[14]  Y. Savaria,et al.  Software detection mechanisms providing full coverage against single bit-flip faults , 2004, IEEE Transactions on Nuclear Science.

[15]  Christopher Edwards,et al.  Anti-windup and bumpless-transfer schemes , 1998, Autom..

[16]  Mário Zenha Rela,et al.  On the use of disaster prediction for failure-tolerance in feedback control systems , 2002, Proceedings International Conference on Dependable Systems and Networks.

[17]  Johan Karlsson,et al.  Assembly-Level Pre-injection Analysis for Improving Fault Injection Efficiency , 2005, EDCC.