Compiler-guided register reliability improvement against soft errors

With the scaling of technology, transient errors caused by external particle strikes have become a critical challenge for microprocessor design. As embedded processors are widely used in reliability-sensitive environments, it becomes increasingly important to develop cost-effective techniques to improve the processor reliability against soft errors. This paper focuses on studying the register file immunity against soft errors since modern processors typically employ a large number of registers, which are accessed very frequently. As a result, soft errors occurred in registers can easily propagate to functional units or the memory system, leading to silent data error (SDC) or system crash.To develop cost-effective techniques to fight soft errors for embedded processors, the first step is to understand the register file susceptibility to soft errors and its impact on the system reliability accurately. Toward this goal, this paper proposes the concept of register vulnerability factor (RVF) to characterize the probability that register transient errors can escape the register file and thus potentially impact the system reliability. Built upon the RVF concept, we then propose two cost-effective compiler-guided techniques to improve the register file reliability by lowering the RVF value. Our experiments indicate that on average, the RVF can be reduced to 9.1% and 9.5% by the hyperblock-based instruction re-scheduling and the reliability-oriented register assignment respectively, which can potentially lower the reliability cost significantly while protecting register files against transient errors.

[1]  Chin-Long Chen,et al.  Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1984, IBM J. Res. Dev..

[2]  M. Tremblay,et al.  Support for fault tolerance in VLSI processors , 1989, IEEE International Symposium on Circuits and Systems,.

[3]  Scott A. Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.

[4]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[5]  Timothy J. Dell,et al.  A white paper on the benefits of chipkill-correct ecc for pc server main memory , 1997 .

[6]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[7]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[8]  Chung-Ho Chen,et al.  Fault Containment in Cache Memories for TMR Redundant Processor Systems , 1999, IEEE Trans. Computers.

[9]  Arun K. Somani,et al.  Area efficient architectures for information integrity in cache memories , 1999, ISCA.

[10]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[11]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  Wei Zhang,et al.  Exploiting VLIW schedule slacks for dynamic and leakage energy reduction , 2001, MICRO.

[13]  Dual use of superscalar datapath for transient-fault detection and recovery , 2001, MICRO.

[14]  Arun K. Somani,et al.  Soft error sensitivity characterization for microprocessor dependability enhancement strategy , 2002, Proceedings International Conference on Dependable Systems and Networks.

[15]  Massimo Violante,et al.  An accurate analysis of the effects of soft errors in the instruction and data caches of a pipelined microprocessor , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[16]  Shubhendu S. Mukherjee,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[17]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[18]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[19]  Mahmut T. Kandemir,et al.  Increasing register file immunity to transient errors , 2005, Design, Automation and Test in Europe.