Techniques to improve the hard and soft error reliability of distributed architectures

Aggressive technology scaling, rising on-chip integration, and the continued increase in microprocessor power and thermal density threaten both the hard and soft error reliability of future microprocessor designs. Therefore, designing low overhead mechanisms for improving reliability will be a critical requirement at future technologies. Technology constraints of wire-delay and power consumption, and limits on deep pipelining, have impelled a shift to distributed architectures that rely on modularity in design, and on-chip interconnection networks for communication, and place a greater burden on software for exploiting concurrency from the application to achieve high performance on the distributed substrate [1]. The focus of this dissertation is on architectural techniques for improving the hard and soft error reliability of future technology-scalable distributed architectures. We make the key observation that these underlying principles of distributed architectures have important synergies that can be exploited to improve the hard and soft error reliability of microprocessors at low overhead. Using a detailed end-to-end model for chip yield, we demonstrate that with just redundant rows and columns in memory arrays and caches the yield of chip multiprocessors drops substantially from 85% at 250nm to 60% at 50nm. We exploit the three principles of modern and future distributed architectures: the abundant microarchitectural redundancy provided by modular design, the natural redundancy in communication paths provided by multi-hop, routed, on-chip networks, and the availability of greater software assistance; for efficiently managing the redundancy to improve yield at low performance overhead. Using just modular redundancy at the intra- and inter-processor granularity, we improve the yield of chip multiprocessors to 99.6% at 50nm, with a maximum reduction in performance in any chip of less than 20%. Further, we extend this technique to take advantage of the block-atomic, and static-placement-dynamic-issue execution model in the TRIPS architecture to efficiently manage the redundancy provided by modular design and on-chip networks. Our evaluation of this compiler-assisted yield enhancement technique in the TRIPS architecture shows significant yield improvement with less than 4% impact on performance. This dissertation also quantitatively demonstrates through detailed modeling that the raw soft error rate, especially that of combinational logic, will increase substantially at future technologies. This emphasizes the need for innovative solutions that extend soft error protection to latches, and combinational logic, while appropriately balancing the power consumption, area, and complexity overhead. We propose a new class of better-than-worst-case soft error reliability techniques called AVF throttling, that trade concurrency for reducing the amount of processor state vulnerable to soft errors. Since future architectures must increasingly rely on exploiting concurrency for achieving high performance, they aggressively bring future program state into the processor and mine them for available parallelism, thus increasing the amount of vulnerable state. AVF throttling is based on the key observation that while exploiting concurrency on the critical path can significantly improve performance, the majority of the program has abundant slack and can be deferred to substantially reduce the amount of vulnerable state with negligible effect on the execution time. Our evaluation in the TRIPS architecture shows that around 90% of the vulnerable state is due to slack. We design a hybrid AVF throttling technique that uses the compiler to estimate slack and the hardware to dynamically exploit it. Using the compiler for static slack estimation considerably reduces the complexity of the technique. Further, it takes advantage of the TRIPS execution model and on-chip networks to exploit slack more efficiently, and significantly improves reliability by 25-42% for a set of SPEC and EEMBC benchmarks. We also present a detailed comparison of AVF throttling with prior approaches including redundant execution, and selective redundant execution. Based on the comparison, we argue that while AVF throttling may provide a smaller absolute reliability improvement, it significantly reduces the power consumption and complexity overhead, making the three techniques appropriate in systems with different reliability requirements. Overall, this dissertation establishes that distributed architectures provide a good foundation for building a reliable system from unreliable components, and our results set a good starting point for further innovative research in this area.

[1]  Michael Nicolaidis,et al.  Carry checking/parity prediction adders and ALUs , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[2]  Doug Burger,et al.  Measuring Experimental Error in Microprocessor Simulation , 2001, ISCA 2001.

[3]  Antonio J. Acosta,et al.  Logical modelling of delay degradation effect in static CMOS gates , 2000 .

[4]  R. Nagarajan,et al.  A design space evaluation of grid processor architectures , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[5]  Valeria Bertacco,et al.  Assessing SEU Vulnerability via Circuit-Level Timing Analysis , 2005 .

[6]  Shekhar Y. Borkar VLSI Design Challenges for Gigascale Integration , 2005, VLSI Design.

[7]  Krisztián Flautner,et al.  A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor , 2005 .

[8]  Nicholas P. Mencinger,et al.  A Mechanism-Based Methodology for Processor Package Reliability Assessments , 2000 .

[9]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10]  C. H. Stapper,et al.  Yield Model for Productivity Optimization of VLSI Memory Chips with Redundancy and Partially Good Product , 1980, IBM J. Res. Dev..

[11]  Leo B. Freeman Critical charge calculations for a bipolar SRAM array , 1996, IBM J. Res. Dev..

[12]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[13]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[14]  Lisa Spainhower,et al.  IBM S/390 Parallel Enterprise Server G5 fault tolerance: A historical perspective , 1999, IBM J. Res. Dev..

[15]  Anand Sivasubramaniam,et al.  SlicK: slice-based locality exploitation for efficient redundant multithreading , 2006, ASPLOS XII.

[16]  N. Ranganathan,et al.  A wire-delay scalable microprocessor architecture for high performance systems , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[17]  Karthikeyan Sankaralingam,et al.  Implementation and Evaluation of a Dynamically Routed Processor Operand Network , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[18]  Janusz Rajski,et al.  Logic BIST for large industrial designs: real issues and case studies , 1999, International Test Conference 1999. Proceedings (IEEE Cat. No.99CH37034).

[19]  T. Juhnke,et al.  Calculation of the Soft Error Rate of Submicron CMOS Logic Circuits , 1994, ESSCIRC '94: Twientieth European Solid-State Circuits Conference.

[20]  Edward J. McCluskey,et al.  Which concurrent error detection scheme to choose ? , 2000, Proceedings International Test Conference 2000 (IEEE Cat. No.00CH37159).

[21]  Rajeev Balasubramonian,et al.  Power-Efficient Approaches to , 2007 .

[22]  T. N. Vijaykumar,et al.  Rescue: a microarchitecture for testability and defect tolerance , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[23]  P. Hazucha,et al.  Impact of CMOS technology scaling on the atmospheric neutron soft error rate , 2000 .

[24]  James F. Ziegler,et al.  Terrestrial cosmic rays , 1996, IBM J. Res. Dev..

[25]  David Blaauw,et al.  ElastIC: An Adaptive Self-Healing Architecture for Unpredictable Silicon , 2006, IEEE Design & Test of Computers.

[26]  Ming Zhang,et al.  Circuit Failure Prediction and Its Application to Transistor Aging , 2007, 25th IEEE VLSI Test Symposium (VTS'07).

[27]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[28]  Peter Hazucha,et al.  Characterization of soft errors caused by single event upsets in CMOS processes , 2004, IEEE Transactions on Dependable and Secure Computing.

[29]  Sanjay J. Patel,et al.  Instruction fetch deferral using static slack , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[30]  E. A. Burke,et al.  Calculation of Cosmic-Ray Induced Soft Upsets and Scaling in VLSI Devices , 1982, IEEE Transactions on Nuclear Science.

[31]  Daniel H. Linder,et al.  An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-Ary n-Cubes , 1994, IEEE Trans. Computers.

[32]  Neil Vachharajani,et al.  Non-Uniform Fault Tolerance , 2006 .

[33]  Douglas C. Burger,et al.  Design and evaluation of a technology-scalable architecture for instruction-level parallelism , 2007 .

[34]  G. R. Srinivasan,et al.  Soft-error Monte Carlo modeling program, SEMM , 1996, IBM J. Res. Dev..

[35]  David I. August,et al.  Design and evaluation of hybrid fault-detection systems , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[36]  J. Maiz,et al.  Alpha-SER modeling and simulation for sub-0.25 /spl mu/m CMOS technology , 1999, 1999 Symposium on VLSI Technology. Digest of Technical Papers (IEEE Cat. No.99CH36325).

[37]  Nur A. Touba,et al.  Cost-effective approach for reducing soft error failure rate in logic circuits , 2003, International Test Conference, 2003. Proceedings. ITC 2003..

[38]  N. Seifert,et al.  Timing vulnerability factors of sequentials , 2004, IEEE Transactions on Device and Materials Reliability.

[39]  T. Sugii,et al.  Impact of cosmic ray neutron induced soft errors on advanced submicron CMOS circuits , 1996, 1996 Symposium on VLSI Technology. Digest of Technical Papers.

[40]  Timothy J. Maloney,et al.  The Quality and Reliability of Intel's Quarter Micron Process , 2000 .

[41]  John F. Meyer,et al.  On Evaluating the Performability of Degradable Computing Systems , 1980, IEEE Transactions on Computers.

[42]  Janak H. Patel,et al.  A logic-level model for /spl alpha/-particle hits in CMOS circuits , 1993, Proceedings of 1993 IEEE International Conference on Computer Design ICCD'93.

[43]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[44]  Kathryn S. McKinley,et al.  Static placement, dynamic issue (SPDI) scheduling for EDGE architectures , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[45]  David Blaauw,et al.  An Efficient Static Algorithm for Computing the Soft Error Rates of Combinational Circuits , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[46]  H. Peter Hofstee Power-constrained microprocessor design , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[47]  Xia Chen,et al.  A spatial path scheduling algorithm for EDGE architectures , 2006, ASPLOS XII.

[48]  David I. August,et al.  Software-controlled fault tolerance , 2005, TACO.

[49]  Babak Falsafi,et al.  Dual use of superscalar datapath for transient-fault detection and recovery , 2001, MICRO.

[50]  Doug Burger,et al.  Exploiting microarchitectural redundancy for defect tolerance , 2003, Proceedings 21st International Conference on Computer Design.

[51]  Alvin R. Lebeck,et al.  Exploiting Load Latency Tolerance in Dynamically Scheduled Processors , 1998 .

[52]  Sanjay J. Patel,et al.  ReStore: Symptom-Based Soft Error Detection in Microprocessors , 2006, IEEE Trans. Dependable Secur. Comput..

[53]  Lisa Spainhower,et al.  Commercial fault tolerance: a tale of two systems , 2004, IEEE Transactions on Dependable and Secure Computing.

[54]  André DeHon,et al.  Seven strategies for tolerating highly defective fabrication , 2005, IEEE Design & Test of Computers.

[55]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[56]  Nhon Quach,et al.  High Availability and Reliability in the Itanium Processor , 2000, IEEE Micro.

[57]  Wojciech Maly,et al.  Yield estimation model for VLSI artwork evaluation , 1983 .

[58]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[59]  William J. Dally,et al.  Fault Tolerance Techniques for the Merrimac Streaming Supercomputer , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[60]  M. Baze,et al.  Attenuation of single event induced pulses in CMOS combinational logic , 1997 .

[61]  Steven S. Muchnick,et al.  Efficient instruction scheduling for a pipelined architecture , 1986, SIGPLAN '86.

[62]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[63]  Timothy J. Dell,et al.  A white paper on the benefits of chipkill-correct ecc for pc server main memory , 1997 .

[64]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[65]  Kang G. Shin,et al.  Adaptive Fault-Tolerant Deadlock-Free Routing in Meshes and Hypercubes , 1996, IEEE Trans. Computers.

[66]  Sarita V. Adve,et al.  AS SCALING THREATENS TO ERODE RELIABILITY STANDARDS, LIFETIME RELIABILITY MUST BECOME A FIRST-CLASS DESIGN CONSTRAINT. MICROARCHITECTURAL INTERVENTION OFFERS A NOVEL WAY TO MANAGE LIFETIME RELIABILITY WITHOUT SIGNIFICANTLY SACRIFICING COST AND PERFORMANCE , 2005 .

[67]  B. Davari CMOS technology scaling, 0.1 /spl mu/m and beyond , 1996, International Electron Devices Meeting. Technical Digest.

[68]  Sanjay J. Patel,et al.  Examining ACE analysis reliability estimates using fault-injection , 2007, ISCA '07.

[69]  O. Semenov,et al.  CMOS IC technology scaling and its impact on burn-in , 2004, IEEE Transactions on Device and Materials Reliability.

[70]  Ming Zhang,et al.  A soft error rate analysis (SERA) methodology , 2004, ICCAD 2004.

[71]  James C. Pickel,et al.  Effect of CMOS Miniaturization on Cosmic-Ray-Induced Error Rate , 1982, IEEE Transactions on Nuclear Science.

[72]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[73]  S.D. LaLumondiere,et al.  Topology-related upset mechanisms in design hardened storage cells , 1997, RADECS 97. Fourth European Conference on Radiation and its Effects on Components and Systems (Cat. No.97TH8294).

[74]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[75]  Bin Zhang,et al.  FASER: fast analysis of soft error susceptibility for cell-based designs , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[76]  William J. Dally,et al.  Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels , 1993, IEEE Trans. Parallel Distributed Syst..

[77]  Mark Horowitz,et al.  Timing Models for MOS Circuits , 1983 .

[78]  Engin Ipek,et al.  Utilizing Dynamically Coupled Cores to Form a Resilient Chip Multiprocessor , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[79]  R. Hokinson,et al.  Historical trend in alpha-particle induced soft error rates of the Alpha/sup TM/ microprocessor , 2001, 2001 IEEE International Reliability Physics Symposium Proceedings. 39th Annual (Cat. No.00CH37167).

[80]  Karthikeyan Sankaralingam,et al.  Dataflow Predication , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[81]  A. Klaiber The Technology Behind Crusoe TM Processors Low-power x 86-Compatible Processors Implemented with Code Morphing , 2000 .

[82]  Karthikeyan Sankaralingam,et al.  Polymorphous architectures: a unified approach for extracting concurrency of different granularities , 2006 .

[83]  Simha Sethumadhavan,et al.  Distributed Microarchitectural Protocols in the TRIPS Prototype Processor , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[84]  Peter Hazucha Background radiation and soft errors in CMOS circuits , 2000 .

[85]  Israel Koren,et al.  Defect tolerance in VLSI circuits: techniques and yield analysis , 1998, Proc. IEEE.

[86]  Sarita V. Adve,et al.  Predictive dynamic thermal management for multimedia applications , 2003, ICS '03.

[87]  Ravi Nair,et al.  Effect of increasing chip density on the evolution of computer architectures , 2002, IBM J. Res. Dev..

[88]  K. Johansson,et al.  In-flight and ground testing of single event upset sensitivity in static RAMs , 1997 .

[89]  Xia Chen,et al.  Critical path analysis of the TRIPS architecture , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[90]  Ming Zhang,et al.  Logic soft errors in sub-65nm technologies design and CAD challenges , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[91]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.

[92]  Michael S. Floyd,et al.  Fault-tolerant design of the IBM pSeries 690 system using POWER4 processor technology , 2002, IBM J. Res. Dev..

[93]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[94]  Onur Mutlu,et al.  Microarchitecture-based introspection: a technique for transient-fault tolerance in microprocessors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[95]  Sarita V. Adve,et al.  The impact of technology scaling on lifetime reliability , 2004, International Conference on Dependable Systems and Networks, 2004.

[96]  Gundolf Kiefer,et al.  Circuit partitioning for efficient logic BIST synthesis , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[97]  Michael C. Huang,et al.  Dynamically Tuning Processor Resources with Adaptive Processing , 2003, Computer.

[98]  Woody Lichtenstein,et al.  The multiflow trace scheduling compiler , 1993, The Journal of Supercomputing.

[99]  Seung-Moon Yoo,et al.  A framework for dynamic energy efficiency and temperature management , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[100]  Joefon Jann,et al.  Dynamic reconfiguration: Basic building blocks for autonomic computing on IBM pSeries servers , 2003, IBM Syst. J..

[101]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[102]  Smitha Menon Kalappurakkal Reducing the Soft Error Rates of a High-Performance Microprocessor Using Front-End Throttling , 2006 .

[103]  Michael L. Scott,et al.  Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[104]  T. N. Vijaykumar,et al.  Opportunistic Transient-Fault Detection , 2006, IEEE Micro.

[105]  B. Narasimham,et al.  Radiation-Induced Soft Error Rates of Advanced CMOS Bulk Devices , 2006, 2006 IEEE International Reliability Physics Symposium Proceedings.

[106]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[107]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[108]  Changkyu Kim,et al.  Elastic Threads on Composable Processors , 2006 .

[109]  Todd M. Austin,et al.  Ultra low-cost defect protection for microprocessor pipelines , 2006, ASPLOS XII.

[110]  Charles B. Weinstock,et al.  A Conceptual Framework for System Fault Tolerance , 1992 .

[111]  Johan Karlsson,et al.  On latching probability of particle induced transients in combinational networks , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[112]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[113]  Jiri Gaisler Evaluation of a 32-bit microprocessor with built-in concurrent error-detection , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[114]  Shashank Gupta,et al.  Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks , 2001 .

[115]  Changhong Dai,et al.  Impact of CMOS process scaling and SOI on the soft error rates of logic processes , 2001, 2001 Symposium on VLSI Technology. Digest of Technical Papers (IEEE Cat. No.01 CH37184).

[116]  N. Seifert,et al.  Robust system design with built-in soft-error resilience , 2005, Computer.

[117]  F. M. Miles,et al.  Principles of fault tolerance , 1996, Proceedings of Applied Power Electronics Conference. APEC '96.

[118]  Michael C. Huang,et al.  Power-efficient error tolerance in chip multiprocessors , 2005, IEEE Micro.

[119]  John L. Hennessy,et al.  The Future of Systems Research , 1999, Computer.

[120]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, ISCA.

[121]  Rastislav Bodík,et al.  Focusing processor policies via critical-path prediction , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[122]  Diana Marculescu,et al.  Microarchitecture-level power management , 2002, IEEE Trans. Very Large Scale Integr. Syst..

[123]  J. F. Ziegler,et al.  Terrestrial cosmic ray intensities , 1998, IBM J. Res. Dev..

[124]  James E. Smith,et al.  Isolation in Commodity Multicore Processors , 2007, Computer.

[125]  Norman P. Jouppi,et al.  The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[126]  T. Fischer,et al.  Issue Logic For A 600 MHz Out-of-order Execution , 1997, Symposium 1997 on VLSI Circuits.

[127]  Rastislav Bodík,et al.  Slack: maximizing performance under technological constraints , 2002, ISCA.

[128]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[129]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[130]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[131]  R. Joseph Exploring Salvage Techniques for Multi-core Architectures , 2006 .

[132]  Bharat L. Bhuva,et al.  Analysis of single-event effects in combinational logic-simulation of the AM2901 bitslice processor , 2000 .

[133]  Neeraj Suri,et al.  Designing high-performance and reliable superscalar architectures-the out of order reliable superscalar (O3RS) approach , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[134]  K. Soumyanath,et al.  Scaling trends of cosmic ray induced soft errors in static latches beyond 0.18 /spl mu/ , 2001, 2001 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.01CH37185).

[135]  James R. Larus,et al.  Software and the Concurrency Revolution , 2005, ACM Queue.

[136]  Dean Liu,et al.  Analysis of blocking dynamic circuits , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[137]  Joel S. Emer,et al.  THE SECOND AVOIDS DECLARING ERRORS ON BENIGN FAULTS . APPLYING THESE TECHNIQUES TO A MICROPROCESSOR INSTRUCTION QUEUE SIGNIFICANTLY REDUCES ITS ERROR RATE WITH ONLY MINOR PERFORMANCE DEGRADATION . REDUCING THE SOFT-ERROR RATE OF A HIGH-PERFORMANCE MICROPROCESSOR , 2005 .

[138]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).