A software methodology for detecting hardware faults in VLIW data paths

The proposed methodology aims to achieve processor data paths for VLIW architectures able to autonomously detect transient and permanent hardware faults while executing their applications. The approach, carried out on the compiled application software, provides the introduction of additional instructions for controlling the correctness of the computation with respect to failures in one of the data path functional units. The advantage of a software approach to hardware fault detection is interesting because it allows one to apply it only to the critical applications executed on the VLIW architecture, thus not causing a delay in the execution of noncritical tasks. Furthermore, by exploiting the intrinsic redundancy of this class of architectures no hardware modification is required on the data path so that no processor customization is necessary.

[1]  Suku Nair,et al.  Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection , 1999, IEEE Trans. Parallel Distributed Syst..

[2]  Joseph A. Fisher,et al.  Very Long Instruction Word architectures and the ELI-512 , 1983, ISCA '83.

[3]  Stephen S. Yau,et al.  An Approach to Concurrent Control Flow Checking , 1980, IEEE Transactions on Software Engineering.

[4]  Paolo Faraboschi,et al.  The latest word in digital and media processing , 1998 .

[5]  Scott A. Mahlke,et al.  IMPACT: an architectural framework for multiple-instruction-issue processors , 1991, ISCA '91.

[6]  Jean Arlat,et al.  Definition and analysis of hardware- and software-fault-tolerant architectures , 1990, Computer.

[7]  Marco Torchiano,et al.  An experimental evaluation of the effectiveness of automatic rule-based transformations for safety-critical applications , 2000, Proceedings IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[8]  Douglas M. Blough,et al.  Fault tolerance in super-scalar and vliw processors , 1991 .

[9]  Prithviraj Banerjee,et al.  Low Cost Concurrent Error Detection in a VLIW Architecture Using Replicated Instructions , 1992, ICPP.

[10]  Richard M. Sedmak,et al.  Fault Tolerance of a General Purpose Computer Implemented by Very Large Scale Integration , 1980, IEEE Transactions on Computers.

[11]  Jean-Claude Laprie,et al.  Saturation: reduced idleness for improved fault-tolerance , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[12]  Robert S. Swarz,et al.  The theory and practice of reliable system design , 1982 .

[13]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[14]  Andy D. Pimentel,et al.  TriMedia CPU64 architecture , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[15]  Donatella Sciuto,et al.  Self-checking FSMs based on a constant distance state encoding , 1995, Proceedings of International Workshop on Defect and Fault Tolerance in VLSI.

[16]  Kewal K. Saluja,et al.  A study of time-redundant fault tolerance techniques for high-performance pipelined computers , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[17]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[18]  Cristiana Bolchini,et al.  A software methodology for detecting hardware faults in VLIW data paths , 2001, Proceedings 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[19]  John Paul Shen,et al.  Exploiting Instruction-Level Parallelism for Integrated Control-Flow Monitoring , 1994, IEEE Trans. Computers.