A confidence-driven model for error-resilient computing

We propose an adaptive reliability enhancement structure for deeply-scaled CMOS and future devices that exhibit nondeterministic behavior. This structure forms the basis of a confidence-driven computing model that can be implemented in either a rollback recovery or an iterative dual modular redundancy method incorporating synchronous handshake schemes. The performance and cost of the computing model are estimated using a 45 nm CMOS technology and the functionality is verified by FPGA-based emulation. The confidence-driven computing model is demonstrated using a 16-bit, 12-stage CORDIC processor operating under random, transient errors. The confidence-driven computing model adapts to the fluctuating error rates at the device substrate level to guarantee the reliability of computation at the system level. This computing model costs 4.2 times smaller area and 2.7 times less energy overhead than triple modular redundancy to guarantee a system-level mean time to failure of two years.

[1]  Hagbae Kim,et al.  A Time Redundancy Approach to TMR Failures Using Fault-State Likelihoods , 1994, IEEE Trans. Computers.

[2]  Naresh R. Shanbhag,et al.  Energy-efficient signal processing via algorithmic noise-tolerance , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[3]  Michael Nicolaidis Time redundancy based soft-error tolerance to rescue nanometer technologies , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[4]  A. S. Sadek,et al.  Fault-tolerant techniques for nanocomputers , 2002 .

[5]  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[6]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[7]  Scott A. Mahlke,et al.  BulletProof: a defect-tolerant CMP switch architecture , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[8]  Krishna V. Palem,et al.  Probabilistic CMOS Technology: A Survey and Future Directions , 2006, 2006 IFIP International Conference on Very Large Scale Integration.

[9]  Todd M. Austin,et al.  CrashTest: A fast high-fidelity FPGA-based resiliency analysis framework , 2008, 2008 IEEE International Conference on Computer Design.

[10]  Amin Ansari,et al.  The StageNet fabric for constructing resilient multicore systems , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[11]  Helia Naeimi,et al.  Fault-tolerant sub-lithographic design with rollback recovery. , 2008, Nanotechnology.

[12]  Rudy Lauwereins,et al.  Design, Automation, and Test in Europe , 2008 .

[13]  W. Lu,et al.  Programmable Resistance Switching in Nanoscale Two-terminal Devices , 2008 .

[14]  Wilfried Haensch,et al.  Study of channel length scaling in large-scale graphene FETs , 2010, 2010 Symposium on VLSI Technology.

[15]  Takao Marukame,et al.  Reconfigurable characteristics of spintronics-based MOSFETs for nonvolatile integrated circuits , 2010, 2010 Symposium on VLSI Technology.

[16]  Vladimir Stojanovic,et al.  Demonstration of integrated micro-electro-mechanical switch circuits for VLSI applications , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[17]  Hai Wei,et al.  Efficient metallic carbon nanotube removal readily scalable to wafer-level VLSI CNFET circuits , 2010, 2010 Symposium on VLSI Technology.

[18]  Douglas L. Jones,et al.  Scalable stochastic processors , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[19]  Quinn Jacobson,et al.  ERSA: error resilient system architecture for probabilistic applications , 2010, DATE 2010.

[20]  John Sartori,et al.  Fluid NMR-Performing Power/Reliability Tradeoffs for Applications with Error Tolerance , .