Reducing resource redundancy for concurrent error detection techniques in high performance microprocessors

With reducing feature size, increasing chip capacity, and increasing clock speed, microprocessors are becoming increasingly susceptible to transient (soft) errors. Redundant multi-threading (RMT) is an attractive approach for concurrent error detection and recovery. However, redundant threads significantly increase the pressure on the processor resources, resulting in dramatic performance impact. In this paper, we propose reducing resource redundancy as a means to mitigate the performance impact of redundancy. In this approach, all the instructions are redundantly executed, however, the redundant instructions do not use many of the resources used by an instruction. The approach taken to reduce resource redundancy is to exploit the runtime profile of the leading thread to optimally allocate resources to the trailing thread in a staggered RMT architecture. The key observation used in this approach is that, even with a small slack between the two threads, many instructions in the leading thread have already produced their results before their trailing counterparts are renamed. We investigate two techniques in this approach (i) register bits reuse technique that attempts to use the same register (but different bits) for both the copies of the same instruction, if the result produced by the instruction is of small size, and (ii) register value reuse technique that attempts to use the same register for a main instruction and a distinct redundant instruction, if both the instructions produce the same result. These techniques, along with some others, are used to reduce redundancy in register file, reorder buffer, and load/store buffer. The techniques are evaluated in terms of their performance, power, and vulnerability impact on an RMT processor. Our experiments show that the techniques achieve about 95% performance improvement and about 17% energy reduction. The vulnerability of the RMT remains the same with the techniques.

[1]  Irith Pomeranz,et al.  Transient-fault recovery for chip multiprocessors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[2]  David J. Sager,et al.  A 0 . 18-m CMOS IA-32 Processor With a 4-GHz Integer Execution Unit , 2001 .

[3]  Prithviraj Banerjee,et al.  Low Cost Concurrent Error Detection in a VLIW Architecture Using Replicated Instructions , 1992, ICPP.

[4]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[5]  Irith Pomeranz,et al.  Transient-fault recovery using simultaneous multithreading , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[6]  Kanad Ghose,et al.  Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[7]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[8]  Babak Falsafi,et al.  Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[9]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[10]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[11]  Janak H. Patel,et al.  Concurrent Error Detection in ALU's by Recomputing with Shifted Operands , 1982, IEEE Transactions on Computers.

[12]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[13]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[14]  Gurindar S. Sohi,et al.  Exploiting Value Locality in Physical Register Files , 2003, MICRO.

[15]  K. Sundaramoorthy,et al.  Slipstream processors: improving both performance and fault tolerance , 2000, SIGP.

[16]  S. Samaan,et al.  A 0.18 /spl mu/m CMOS IA32 microprocessor with a 4 GHz integer execution unit , 2001, 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).

[17]  Timothy J. Slegel,et al.  IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.

[18]  Aneesh Aggarwal,et al.  Bit-Sliced Datapath for Energy-Efficient High Performance Microprocessors , 2004, PACS.

[19]  Gabriel H. Loh Exploiting data-width locality to increase superscalar execution bandwidth , 2002, MICRO 35.

[20]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[21]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[22]  Babak Falsafi,et al.  Dual use of superscalar datapath for transient-fault detection and recovery , 2001, MICRO.