Fine-grained timing and control flow error checking for hard real-time task execution

Robustness and reliability are essential requirements of today's embedded systems. Especially errors in the control flow of a program, e.g. caused by transient errors, may lead to a faulty system behavior potentially with catastrophic consequences. Several methods for control flow checking have been proposed during the last decades. However, these techniques mostly focus on a correct sequence of application parts but not on the correct timing behavior of the control flow, which is essential for hard real-time systems. In this paper, we present a new approach which introduces fine-grained on-line timing checks for hard real-time systems combined with a lightweight control flow monitoring technique. The proposed approach is a hybrid hardware-software technique: We instrument the application code at compile-time by adding checkpoints, which contain temporal and logical information of the control flow. During run-time, a small hardware check unit connected to the core reads the instrumented data in order to verify the correctness of the application's control flow and timing behavior. The finegrained functionality of our mechanism allows a detection of many transient errors, associated with very low detection latency. It is no longer necessary to redundantly execute code in order to monitor anomalies. The hardware overhead is limited to a small check unit (only 0.5 % of chip space compared to the processor core); according to experimental results, the execution time overhead is only 10.6 % in the average case while the memory overhead is 12.3 %.

[1]  Marcus Rimén,et al.  A study of the effects of transient fault injection into a 32-bit RISC with built-in watchdog , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[2]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.

[3]  John Paul Shen,et al.  Processor Control Flow Monitoring Using Signatured Instruction Streams , 1987, IEEE Transactions on Computers.

[4]  R. Velazco,et al.  Experimentally evaluating an automatic approach for generating safety-critical software with respect to transient errors , 2000 .

[5]  Daniel P. Siewiorek,et al.  Effects of transient gate-level faults on program behavior , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[6]  Pascal Sainrat,et al.  OTAWA: An Open Toolbox for Adaptive WCET Analysis , 2010, SEUS.

[7]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[8]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[9]  M. Rimen,et al.  Implicit signature checking , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[10]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[11]  Lothar Thiele,et al.  Design for Timing Predictability , 2004, Real-Time Systems.

[12]  Régis Leveugle,et al.  A new approach to control flow checking without program modification , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[13]  Luigi Carro,et al.  Hardware and Software Transparency in the Protection of Programs Against SEUs and SETs , 2008, J. Electron. Test..

[14]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[15]  Sascha Uhrig,et al.  How to Enhance a Superscalar Processor to Provide Hard Real-Time Capable In-Order SMT , 2010, ARCS.

[16]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[17]  Jürgen Teich,et al.  Concepts for run-time and error-resilient control flow checking of embedded RISC CPUs , 2009, Int. J. Auton. Adapt. Commun. Syst..

[18]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[19]  Bernhard Fechner,et al.  Microcode with Embedded Timing Constraints , 2006, ARCS Workshops.

[20]  David J. Lu Watchdog Processors and Structural Integrity Checking , 1982, IEEE Transactions on Computers.

[21]  Sri Parameswaran,et al.  A hybrid hardware--software technique to improve reliability in embedded processors , 2011, TECS.

[22]  John Paul Shen,et al.  Continuous signature monitoring: low-cost concurrent detection of processor control errors , 1990, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[23]  Masood Namjoo,et al.  Techniques for Concurrent Testing of VLSI Processor Operation , 1982, ITC.

[24]  Suku Nair,et al.  Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection , 1999, IEEE Trans. Parallel Distributed Syst..

[25]  Riccardo Mariani,et al.  Towards functional-safe timing-dependable real-time architectures , 2011, 2011 IEEE 17th International On-Line Testing Symposium.

[26]  Jacob A. Abraham,et al.  CEDA: control-flow error detection through assertions , 2006, 12th IEEE International On-Line Testing Symposium (IOLTS'06).

[27]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[28]  Fabian Vargas,et al.  A new hybrid fault detection technique for systems-on-a-chip , 2006, IEEE Transactions on Computers.

[29]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[30]  Massimo Violante,et al.  Soft-error detection using control flow assertions , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.