Time-Constraint-Aware Optimization of Assertions in Embedded Software

Technology shrinking and sensitization have led to more and more transient faults in embedded systems. Transient faults are intermittent and non-predictable faults caused by external events, such as energetic particles striking the circuits. These faults do not cause permanent damages, but may affect the running applications. One way to ensure the correct execution of these embedded applications is to keep debugging and testing even after shipping of the systems, complemented with recovery/restart options. In this context, the executable assertions that have been widely used in the development process for design validation can be deployed again in the final product. In this way, the application will use the assertion to monitor itself under the actual execution and will not allow erroneous out-of-the-specification behavior to manifest themselves. This kind of software-level fault tolerance may represent a viable solution to the problem of developing commercial off-the-shelf embedded systems with dependability requirements. But software-level fault tolerance comes at a computational cost, which may affect time-constrained applications. Thus, the executable assertions shall be introduced at the best possible points in the application code, in order to satisfy timing constraints, and to maximize the error detection efficiency. We present an approach for optimization of executable assertion placement in time-constrained embedded applications for the detection of transient faults. In this work, assertions have different characteristics such as tightness, i.e., error coverage, and performance degradation. Taking into account these properties, we have developed an optimization methodology, which identifies candidate locations for assertions and selects a set of optimal assertions with the highest tightness at the lowest performance degradation. The set of selected assertions is guaranteed to respect the real-time deadlines of the embedded application. Experimental results have shown the effectiveness of the proposed approach, which provides the designer with a flexible infrastructure for the analysis of time-constrained embedded applications and transient-fault-oriented executable assertions.

[1]  Yue Lu,et al.  Statistical-Based Response-Time Analysis of Systems with Execution Dependencies between Tasks , 2010, 2010 15th IEEE International Conference on Engineering of Complex Computer Systems.

[2]  Andreas Gerstinger,et al.  Automated software diversity for hardware fault detection , 2009, 2009 IEEE Conference on Emerging Technologies & Factory Automation.

[3]  Martin Hiller,et al.  Executable assertions for detecting data errors in embedded control systems , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[4]  Carl E. Baum,et al.  From the electromagnetic pulse to high-power electromagnetics , 1992, Proc. IEEE.

[5]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[6]  Israel Koren,et al.  Fault-Tolerant Systems , 2007 .

[7]  Leon Lantz,et al.  Soft errors induced by alpha particles , 1996, IEEE Trans. Reliab..

[8]  Jacob A. Abraham,et al.  CEDA: control-flow error detection through assertions , 2006, 12th IEEE International On-Line Testing Symposium (IOLTS'06).

[9]  Alan D. George,et al.  RapidIO for radar processing in advanced space systems , 2007, TECS.

[10]  Johan Karlsson,et al.  GOOFI: generic object-oriented fault injection tool , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[11]  Roman Obermaisser,et al.  Out-of-norm assertions [diagnostic mechanism] , 2005, 11th IEEE Real Time and Embedded Technology and Applications Symposium.

[12]  Cecilia Metra,et al.  Transient Fault and Soft Error On-die Monitoring Scheme , 2010, 2010 IEEE 25th International Symposium on Defect and Fault Tolerance in VLSI Systems.

[13]  Alberto L. Sangiovanni-Vincentelli,et al.  Fault-tolerant platforms for automotive safety-critical applications , 2003, CASES '03.

[14]  James L. Walsh,et al.  IBM experiments in soft fails in computer electronics (1978-1994) , 1996, IBM J. Res. Dev..

[15]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[16]  Heidrun Engel,et al.  Data flow transformations to detect results which are corrupted by hardware faults , 1996, Proceedings. IEEE High-Assurance Systems Engineering Workshop (Cat. No.96TB100076).

[17]  Marco Torchiano,et al.  A source-to-source compiler for generating dependable software , 2001, Proceedings First IEEE International Workshop on Source Code Analysis and Manipulation.

[18]  Niraj K. Jha,et al.  Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[19]  I. Coorporation,et al.  Using the rdtsc instruction for performance monitoring , 1997 .

[20]  Michael Nicolaidis,et al.  Soft Errors in Modern Electronic Systems , 2010 .

[21]  Suku Nair,et al.  Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection , 1999, IEEE Trans. Parallel Distributed Syst..

[22]  Alfredo Benso,et al.  A C/C++ source-to-source compiler for dependable applications , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[23]  Dhiraj K. Pradhan,et al.  Fault-tolerant computing : theory and techniques , 1986 .

[24]  Edward J. McCluskey,et al.  Error detection by selective procedure call duplication for low energy consumption , 2002, IEEE Trans. Reliab..

[25]  Henrique Madeira,et al.  Experimental evaluation of the fail-silent behaviour in programs with consistency checks , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[26]  James M. Bieman,et al.  Improving software testability with assertion insertion , 1994, Proceedings., International Test Conference.

[27]  Massimo Violante,et al.  Software-Implemented Hardware Fault Tolerance , 2010 .

[28]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[29]  Michael Paul Kowalski,et al.  USA experiment on the ARGOS satellite: a low-cost instrument for timing x-ray binaries , 1994, Optics & Photonics.

[30]  R. Velazco,et al.  Experimentally evaluating an automatic approach for generating safety-critical software with respect to transient errors , 2000 .

[31]  Giovanni Squillero,et al.  RT-Level ITC'99 Benchmarks and First ATPG Results , 2000, IEEE Des. Test Comput..

[32]  Hiroyuki Sugiyama,et al.  A 1.3 GHz fifth generation SPARC64 microprocessor , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[33]  Jeffrey M. Voas,et al.  Putting assertions in their place , 1994, Proceedings of 1994 IEEE International Symposium on Software Reliability Engineering.

[34]  Neeraj Suri,et al.  On the placement of software mechanisms for detection of data errors , 2002, Proceedings International Conference on Dependable Systems and Networks.

[35]  Alberto L. Sangiovanni-Vincentelli,et al.  Embedded System Design for Automotive Applications , 2007, Computer.

[36]  Y. C. Yeh,et al.  Triple-triple redundant 777 primary flight computer , 1996, 1996 IEEE Aerospace Applications Conference. Proceedings.

[37]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[38]  Massimo Violante,et al.  Soft-error detection using control flow assertions , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[39]  E. Normand Single event upset at ground level , 1996 .

[40]  Pascal Fradet,et al.  Implementing fault-tolerance in real-time programs by automatic program transformations , 2008, TECS.

[41]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[42]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[43]  Kewal K. Saluja,et al.  Fault tolerance through re-execution in multiscalar architecture , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[44]  D. Rossi,et al.  Latch Susceptibility to Transient Faults and New Hardening Approach , 2007, IEEE Transactions on Computers.

[45]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[46]  G. C. Messenger,et al.  The effects of radiation on electronic systems , 1986 .

[47]  R.C. Baumann,et al.  Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.

[48]  Grzegorz Jablonski,et al.  The automatic implementation of Software Implemented Hardware Fault Tolerance algorithms as a radiation-induced soft errors mitigation technique , 2008, 2008 IEEE Nuclear Science Symposium Conference Record.

[49]  Kewal K. Saluja,et al.  A study of time-redundant fault tolerance techniques for high-performance pipelined computers , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[50]  Petru Eles,et al.  Scheduling with bus access optimization for distributed embedded systems , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[51]  N. Seifert,et al.  Comparison of alpha-particle and neutron-induced combinational and sequential logic error rates at the 32nm technology node , 2009, 2009 IEEE International Reliability Physics Symposium.

[52]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[53]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[54]  R. Baumann Soft errors in advanced semiconductor devices-part I: the three radiation sources , 2001 .

[55]  Nancy G. Leveson,et al.  An investigation of the Therac-25 accidents , 1993, Computer.

[56]  Stephen S. Yau,et al.  An Approach to Concurrent Control Flow Checking , 1980, IEEE Transactions on Software Engineering.

[57]  Franco Fummi,et al.  HIFSuite: Tools for HDL Code Conversion and Manipulation , 2010, 2010 IEEE International High Level Design Validation and Test Workshop (HLDVT).

[58]  Marco Torchiano,et al.  Soft-error detection through software fault-tolerance techniques , 1999, Proceedings 1999 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (EFT'99).