Reliable data path design of VLIW processor cores with comprehensive error-coverage assessment

In this paper, an effective fault-tolerant framework offering very high error coverage with zero detection latency is proposed to protect the data paths of VLIW processor cores. The feature of zero detection latency is essential to real-time error-recovery. The proposed framework provides the error-handling schemes of varying hardware complexity, performance and error coverage to be selected. A case study with an experimental VLIW architecture implemented in VHDL was used to demonstrate the impacts of our technique on hardware overhead and performance degradation. The fault injection experiments were performed to characterize the effects of fault-occurring frequency as well as workload variations on the error coverage, and the permanent faults on the length of time spent for error-recovery. The results observed from the experiments show that our approach can well protect the VLIW data paths even in a very severe fault scenario. As a result, the proposed fault-tolerant VLIW core is quite suitable for the highly dependable embedded applications.

[1]  Prithviraj Banerjee,et al.  Low Cost Concurrent Error Detection in a VLIW Architecture Using Replicated Instructions , 1992, ICPP.

[2]  Janak H. Patel,et al.  Concurrent Error Detection in ALU's by Recomputing with Shifted Operands , 1982, IEEE Transactions on Computers.

[3]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[4]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[5]  A.L. White Transient faults and network reliability , 2004, 2004 IEEE Aerospace Conference Proceedings (IEEE Cat. No.04TH8720).

[6]  Onur Mutlu,et al.  Microarchitecture-based introspection: a technique for transient-fault tolerance in microprocessors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[7]  Edward J. McCluskey,et al.  Common-mode failures in redundant VLSI systems: a survey , 2000, IEEE Trans. Reliab..

[8]  Toshinori Sato,et al.  Evaluating low-cost fault-tolerance mechanism for microprocessors on multimedia applications , 2001, Proceedings 2001 Pacific Rim International Symposium on Dependable Computing.

[9]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[10]  Nhon Quach,et al.  High Availability and Reliability in the Itanium Processor , 2000, IEEE Micro.

[11]  Vassilios A. Chouliaras,et al.  Study of the Effects of SEU-Induced Faults on a Pipeline Protected Microprocessor , 2007, IEEE Transactions on Computers.

[12]  Kishor S. Trivedi,et al.  Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems , 1989, IEEE Trans. Computers.

[13]  N. Seifert,et al.  Robust system design with built-in soft-error resilience , 2005, Computer.

[14]  Shi-Jinn Horng,et al.  An Online Control Flow Check for VLIW Processor , 2008, 2008 14th IEEE Pacific Rim International Symposium on Dependable Computing.

[15]  F. Irom,et al.  Frequency dependence of single-event upset in advanced commercial PowerPC microprocessors , 2004, IEEE Transactions on Nuclear Science.

[16]  Byung Kook Kim,et al.  Task-scheduling strategies for reliable TMR controllers using task grouping and assignment , 2000, IEEE Trans. Reliab..

[17]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[18]  Peter Hazucha,et al.  Characterization of soft errors caused by single event upsets in CMOS processes , 2004, IEEE Transactions on Dependable and Secure Computing.

[19]  Cristian Constantinescu,et al.  Impact of deep submicron technology on dependability of VLSI circuits , 2002, Proceedings International Conference on Dependable Systems and Networks.

[20]  Arun K. Somani,et al.  REESE: a method of soft error detection in microprocessors , 2001, 2001 International Conference on Dependable Systems and Networks.

[21]  Kewal K. Saluja,et al.  Fault tolerance through re-execution in multiscalar architecture , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[22]  Chaitali Chakrabarti,et al.  Memory Design and Exploration for Low Power, Embedded Systems , 1999 .

[23]  Manoj Franklin A study of time redundant fault tolerance techniques for superscalar processors , 1995, Proceedings of International Workshop on Defect and Fault Tolerance in VLSI.

[24]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[25]  Dhiraj K. Pradhan,et al.  Fault Injection: A Method for Validating Computer-System Dependability , 1995, Computer.

[26]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[27]  Cristiana Bolchini A software methodology for detecting hardware faults in VLIW data paths , 2003, IEEE Trans. Reliab..