Tolerance of performance degrading faults for effective yield improvement

To provide a new avenue for improving yield for nano-scale fabrication processes, we introduce a new notion: performance degrading faults (pdef). A fault is said to be a pdef if it cannot cause a functional error at system outputs but may result in system performance degradation. In a processor, a fault is a pdef if it causes no error in the execution of user programs but may reduce performance, e.g., decrease the number of instructions executed per cycle. By identifying faulty chips that contain pdef's that degrade performance within some limits and binning these chips based on the their resulting instruction throughput, effective yield can be improved in a radically new manner that is completely different from the current practice of performance binning on clock frequency. To illustrate the potential benefits of this notion, we analyze the faults in the branch prediction unit of a processor. Experimental results show that every stuck-at fault in this unit is a pdef. Furthermore, 97% of these faults induce almost no performance degradation.

[1]  S. McFarling Combining Branch Predictors , 1993 .

[2]  Melvin A. Breuer,et al.  An Illustrated Methodology for Analysis of Error Tolerance , 2008, IEEE Design & Test of Computers.

[3]  Melvin A. Breuer,et al.  Defect and error tolerance in the presence of massive numbers of defects , 2004, IEEE Design & Test of Computers.

[4]  Kaushik Roy,et al.  A process-tolerant cache architecture for improved yield in nanoscale technologies , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Antonio Ortega,et al.  Hardware testing for error tolerant multimedia compression based on linear transforms , 2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

[6]  Antonio Ortega,et al.  Analysis and testing for error tolerant motion estimation , 2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

[7]  Yiorgos Makris,et al.  Cost-effective graceful degradation in speculative processor subsystems: the branch prediction case , 2003, Proceedings 21st International Conference on Computer Design.

[8]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[9]  Edward J. McCluskey,et al.  PADded cache: a new fault-tolerance technique for cache memories , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[10]  Krisztián Flautner,et al.  A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor , 2005 .