Yield-Aware Cache Architectures

One of the major issues faced by the semiconductor industry today is that of reducing chip yields. As the process technologies have scaled to smaller feature sizes, chip yields have dropped to around 50% or less. This figure is expected to decrease even further in future technologies. To attack this growing problem, we develop four yield-aware micro architecture schemes for data caches. The first one is called yield-aware power-down (YAPD). YAPD turns off cache ways that cause delay violation and/or have excessive leakage. We also modify this approach to achieve better yields. This new method is called horizontal YAPD (H-YAPD), which turns off horizontal regions of the cache instead of ways. A third approach targets delay violation in data caches. Particularly, we develop a variable-latency cache architecture (VACA). VACA allows different load accesses to be completed with varying latencies. This is enabled by augmenting the functional units with special buffers that allow the dependants of a load operation to stall for a cycle if the load operation is delayed. As a result, if some accesses take longer than the predefined number of cycles, the execution can still be performed correctly, albeit with some performance degradation. A fourth scheme we devise is called the hybrid mechanism, which combines the YAPD and the VACA. As a result of these schemes, chips that may be tossed away due to parametric yield loss can be saved. Experimental results demonstrate that the yield losses can be reduced by 68.1% and 72.4% with YAPD and H-YAPD schemes and by 33.3% and 81.1% with VACA and Hybrid mechanisms, respectively, improving the overall yield to as much as 97.0%

[1]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[2]  Costas J. Spanos,et al.  Modeling within-die spatial correlation effects for process-design co-optimization , 2005, Sixth international symposium on quality electronic design (isqed'05).

[3]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[4]  Kaushik Roy,et al.  A novel on-chip delay measurement hardware for efficient speed-binning , 2005, 11th IEEE International On-Line Testing Symposium.

[5]  M.A. Horowitz,et al.  Speed and power scaling of SRAM's , 2000, IEEE Journal of Solid-State Circuits.

[6]  Sani R. Nassif,et al.  Modeling and analysis of manufacturing variations , 2001, Proceedings of the IEEE 2001 Custom Integrated Circuits Conference (Cat. No.01CH37169).

[7]  Hai Zhou,et al.  Statistical gate sizing for timing yield optimization , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[8]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[9]  Kaushik Roy,et al.  Novel sizing algorithm for yield improvement under process variation in nanometer technology , 2004, Proceedings. 41st Design Automation Conference, 2004..

[10]  Sachin S. Sapatnekar,et al.  Statistical timing analysis considering spatial correlations using a single PERT-like traversal , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[11]  C.H. Kim,et al.  An on-die CMOS leakage current sensor for measuring process variation in sub-90nm generations , 2004, 2005 International Conference on Integrated Circuit Design and Technology, 2005. ICICDT 2005..

[12]  David Blaauw,et al.  Modeling and analysis of parametric yield under power and performance constraints , 2005, IEEE Design & Test of Computers.

[13]  Kaushik Roy,et al.  Delay Modeling and Statistical Design of Pipelined Circuit Under Process Variation , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  K. Ravindran,et al.  First-Order Incremental Block-Based Statistical Timing Analysis , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Vikas Mehrotra,et al.  Modeling the effects of systematic process variation of circuit performance , 2001 .

[16]  Michael Orshansky,et al.  An efficient algorithm for statistical minimization of total power under timing yield constraints , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[17]  David Blaauw,et al.  Statistical timing analysis for intra-die process variations with spatial correlations , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[18]  Sarma B. K. Vrudhula,et al.  A methodology to improve timing yield in the presence of process variations , 2004, Proceedings. 41st Design Automation Conference, 2004..

[19]  Natesan Venkateswaran,et al.  First-Order Incremental Block-Based Statistical Timing Analysis , 2006, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[20]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[21]  Kaushik Roy,et al.  A process-tolerant cache architecture for improved yield in nanoscale technologies , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22]  Rouwaida Kanj,et al.  System-level SRAM yield enhancement , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[23]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[24]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[25]  Kaushik Roy,et al.  Speed binning aware design methodology to improve profit under parameter variations , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[26]  S. G. Duvall,et al.  Statistical circuit modeling and optimization , 2000, 2000 5th International Workshop on Statistical Metrology (Cat.No.00TH8489.

[27]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[28]  E. Alon,et al.  The implementation of a 2-core, multi-threaded itanium family processor , 2006, IEEE Journal of Solid-State Circuits.

[29]  Kaushik Roy,et al.  Yield Prediction of High Performance Pipelined Circuit with Respect to Delay Failures in Sub-100nm Technology , 2005, 11th IEEE International On-Line Testing Symposium.

[30]  Man Lung Mui,et al.  Power supply optimization in sub-130 nm leakage dominant technologies , 2004, International Symposium on Signals, Circuits and Systems. Proceedings, SCS 2003. (Cat. No.03EX720).

[31]  Kevin Skadron,et al.  Impact of Parameter Variations on Multi-Core Chips , 2006 .