DARA: A Low-Cost Reliable Architecture Based on Unhardened Devices and Its Case Study of Radiation Stress Test

A microprocessor with an architectural redundancy to achieve high dependability is designed and manufactured to explore the effectiveness of tolerating soft errors without circuit hardening. The processor architecture is based on a modularized pipeline which contains several functionalities to facilitate a real-time error detection and a fast roll-back recovery. As a further extension for a possible increase of hard errors in the future technology, an energy-effective coverage of hard errors by dynamically adapting the redundancy between a dual and a triple module is also included in the processor. A radiation stress test result indicates that the designed redundant but unhardened processor can successfully achieve the same dependability as a hardened processor. Our synthesis and layout results show that radiation hardened circuits increase processor hardware area by 71% and power by 28%, respectively. It is thus possible to use the architectural redundancy instead of circuit hardening to achieve a cost-effective reliability, as suggested by these factors.

[1]  Xiaodong Li,et al.  SoftArch: an architecture-level tool for modeling and analyzing soft errors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[2]  C. L. Chen,et al.  APPENDIX A – Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1992 .

[3]  Irith Pomeranz,et al.  Transient-fault recovery for chip multiprocessors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[4]  Chin-Long Chen,et al.  Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1984, IBM J. Res. Dev..

[5]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[6]  Onur Mutlu,et al.  Microarchitecture-based introspection: a technique for transient-fault tolerance in microprocessors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[7]  P.N. Sanda,et al.  IBM z990 soft error detection and recovery , 2005, IEEE Transactions on Device and Materials Reliability.

[8]  Sarita V. Adve,et al.  The impact of technology scaling on lifetime reliability , 2004, International Conference on Dependable Systems and Networks, 2004.

[9]  Babak Falsafi,et al.  Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[10]  J. Furuta,et al.  An Area-Efficient 65 nm Radiation-Hard Dual-Modular Flip-Flop to Avoid Multiple Cell Upsets , 2011, IEEE Transactions on Nuclear Science.

[11]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[12]  Timothy J. Slegel,et al.  IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.

[13]  N. Seifert,et al.  Robust system design with built-in soft-error resilience , 2005, Computer.

[14]  Daniel P. Siewiorek,et al.  Reliable computer systems (2nd ed.): design and evaluation , 1992 .

[15]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[16]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[17]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.