Oware: Operand width Aware Redundant Execution for Whole-Processor Error Detection

As the feature size of semiconductor technology continues to shrink, high-performance microprocessors are increasingly susceptible to soft errors. Exploiting the fact that narrow-width values universally exist in applications, prior in-register duplication approaches for improving reliability of register file and other data-holding components mitigate performance cost but leave the rest of datapath highly vulnerable. This paper presents a novel whole-processor soft error detection technique to reduce performance degradation by alleviating resource racing, while providing whole-processor error detection via redundant operations. Experimental results show that the IPC of our scheme outperforms conventional symmetric redundant execution by approximately 72%.

[1]  Babak Falsafi,et al.  Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[2]  Margaret Martonosi,et al.  Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[3]  Kanad Ghose,et al.  Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[4]  Aneesh Aggarwal,et al.  Reducing resource redundancy for concurrent error detection techniques in high performance microprocessors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[5]  Shuai Wang,et al.  In-Register Duplication: Exploiting Narrow-Width Value for Improving Register File Reliability , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[6]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[7]  Anand Sivasubramaniam,et al.  A complexity-effective approach to ALU bandwidth enhancement for instruction-level temporal redundancy , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[8]  David I. August,et al.  Design and Evaluation of Hybrid Fault-Detection Systems , 2005, ISCA 2005.

[9]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10]  Tipp Moseley,et al.  Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[11]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[12]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.