Computing and Reducing Transient Error Propagation in Registers

Recent research indicates that transient errors will increasingly become a critical concern in microprocessor design. As embedded processors are widely used in reliability-critical or noisy environments, it is necessary to develop cost-effective fault-tolerant techniques to protect processors against transient errors. The register file is one of the critical components that can significantly affect microprocessor system reliability, since registers are typically accessed very frequently, and transient errors in registers can be easily propagated to functional units or the memory system, leading to silent data error (SDC) or system crash. This paper focuses on investigating the impact of register file soft errors on system reliability and developing cost-effective techniques to improve the register file immunity to soft errors. This paper proposes the register vulnerability factor (RVF) concept to characterize the probability that register transient errors can escape the register file and thus potentially affect system reliability. We propose an approach to compute the RVF based on register access patterns. In this paper, we also propose two compiler-directed techniques and a hybrid approach to improve register file reliability cost-effectively by lowering the RVF value. Our experiments indicate that on average, RVF can be reduced to 9.1% and 9.5% by the hyperblock-based instruction re-scheduling and the reliability-oriented register assignment respectively, which can potentially lower the reliability cost significantly, without sacrificing the register value integrity.

[1]  Aviral Shrivastava,et al.  A Compiler-Microarchitecture Hybrid Approach to Soft Error Reduction for Register Files , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Anand Sivasubramaniam,et al.  Mechanisms for bounding vulnerabilities of processor structures , 2007, ISCA '07.

[3]  Chung-Ho Chen,et al.  Fault Containment in Cache Memories for TMR Redundant Processor Systems , 1999, IEEE Trans. Computers.

[4]  Aamer Jaleel,et al.  Explaining Cache SER Anomaly Using Relative DUE AVF Measurement , 2010, HPCA 2010.

[5]  서정연,et al.  Journal of Computing Science and Engineering(JCSE)의 국제화 작업 , 2010 .

[6]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[7]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[8]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[9]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[10]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[11]  Arun K. Somani,et al.  Soft error sensitivity characterization for microprocessor dependability enhancement strategy , 2002, Proceedings International Conference on Dependable Systems and Networks.

[12]  Massimo Violante,et al.  An accurate analysis of the effects of soft errors in the instruction and data caches of a pipelined microprocessor , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[13]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[14]  Joel S. Emer,et al.  The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.

[15]  Anand Sivasubramaniam,et al.  Characterizing the soft error vulnerability of multicores running multithreaded applications , 2010, SIGMETRICS '10.

[16]  Scott A. Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.

[17]  M. Tremblay,et al.  Support for fault tolerance in VLSI processors , 1989, IEEE International Symposium on Circuits and Systems,.

[18]  Mahmut T. Kandemir,et al.  Increasing register file immunity to transient errors , 2005, Design, Automation and Test in Europe.

[19]  Youngjoong Ko,et al.  Issues and Empirical Results for Improving Text Classification , 2011, J. Comput. Sci. Eng..

[20]  Narayanan Vijaykrishnan,et al.  Analysis of soft error rate in flip-flops and scannable latches , 2003, IEEE International [Systems-on-Chip] SOC Conference, 2003. Proceedings..

[21]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[22]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[23]  Arijit Biswas,et al.  Computing architectural vulnerability factors for address-based structures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[24]  Arun K. Somani,et al.  Area efficient architectures for information integrity in cache memories , 1999, ISCA.

[25]  Dongsoo Han,et al.  Ubiscript: A Script Language for Ubiquitous Environment , 2011, J. Comput. Sci. Eng..

[26]  Babak Falsafi,et al.  Dual use of superscalar datapath for transient-fault detection and recovery , 2001, MICRO.

[27]  Chin-Long Chen,et al.  Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1984, IBM J. Res. Dev..

[28]  Timothy J. Dell,et al.  A white paper on the benefits of chipkill-correct ecc for pc server main memory , 1997 .

[29]  Aamer Jaleel,et al.  Explaining cache SER anomaly using DUE AVF measurement , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.