Cache design for low power and yield enhancement

One of the major limiters to computer systems and systems on chip (SOC) designs is accessing the main memory, which is typically two orders of magnitude slower than the processor. To bridge this gap, modern processors already devote more than half of the on-chip transistors to the last-level cache. Caches have negative impact on area, power, and yield. This research goal is to design caches that operate at lower voltages while enhancing yield. Our strategy is to improve the static noise margin (SNM) and the writability of the conventional six-transistor SRAM cell by reducing the effect of parametric variations on the cell. This is done using a novel circuit that reduces the voltage swing on the word line during read operations and reduces the memory supply voltage during write operations. The proposed circuit increases the SRAM's SNM and write margin using a single voltage supply that has minimal impacts on chip area, complexity, and timing. A test chip with an 8-kilobyte SRAM block manufactured in 45-nm technology is used to verify the practicality of the contribution and demonstrate the effectiveness of the new circuit's implementation. Cache organization is one of the most important factors that affect cache design complexity, performance, area, and power. The main architectural choice for caches is whether to implement the tag array using a standard SRAM or using a content addressable memory (CAM). The choice made has far-reaching consequences on several aspects of the cache design, and in particular on power consumption. Our contribution in this area is an in-depth study of the complex tradeoffs of area, timing, power, and design complexity between an SRAM-based tag and a CAM-based one. Our results indicate that an SRAM-based tag design often provides a better overall design point and is superior with respect to energy, especially for interleaved multi-threading processors. Being able to test and screen chips is a key factor in achieving high yield. Most industry standard CAD tools used to analyze fault coverage and generate test vectors require gate level models. However, since caches are typically designed using a transistor-level flow, there is a need for an abstraction step to generate the gate models, which must be equivalent to the actual design (transistor level). The third contribution of the research is a framework to verify that the gate level representation of custom designs is equivalent to the transistor-level design.

[1]  R. Allmon,et al.  High-performance microprocessor design , 1998, IEEE J. Solid State Circuits.

[2]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[3]  H. Pilo,et al.  An SRAM Design in 65-nm Technology Node Featuring Read and Write-Assist Circuits to Expand Operating Voltage , 2007, IEEE Journal of Solid-State Circuits.

[4]  D. Weiss,et al.  The on-chip 3 MB subarray based 3rd level cache on an Itanium microprocessor , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[5]  Pramod Kolar,et al.  A 1.1 GHz 12 $\mu$A/Mb-Leakage SRAM Design in 65 nm Ultra-Low-Power CMOS Technology With Integrated Leakage Reduction for Mobile Applications , 2008, IEEE Journal of Solid-State Circuits.

[6]  David Harris,et al.  CMOS VLSI Design: A Circuits and Systems Perspective , 2004 .

[7]  K.J. Kuhn,et al.  Reducing Variation in Advanced Logic Technologies: Approaches to Process and Design for Manufacturability of Nanoscale CMOS , 2007, 2007 IEEE International Electron Devices Meeting.

[8]  Koji Nii,et al.  Worst-case analysis to obtain stable read/write DC margin of high density 6T-SRAM-array with local Vth variability , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[9]  Abhijit Chatterjee,et al.  Adaptive Design for Performance-Optimized Robustness , 2006, 2006 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[10]  T. Nigam,et al.  SRAM Variability and Supply Voltage Scaling Challenges , 2007, 2007 IEEE International Reliability Physics Symposium Proceedings. 45th Annual.

[11]  Lawrence T. Clark,et al.  An embedded 32-b microprocessor core for low-power and high-performance applications , 2001 .

[12]  Zvonko G. Vranesic,et al.  Computer Organization , 1984 .

[13]  Sanjay Pant,et al.  A self-tuning DVS processor using delay-error detection and correction , 2005, IEEE Journal of Solid-State Circuits.

[14]  Kevin Zhang,et al.  A 1.1GHz 12μA/Mb-Leakage SRAM Design in 65nm Ultra-Low-Power CMOS with Integrated Leakage Reduction for Mobile Applications , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[15]  Sachin S. Sapatnekar,et al.  Impact of NBTI on SRAM read stability and design for reliability , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[16]  K. Nii,et al.  90-nm process-variation adaptive embedded SRAM modules with power-line-floating write technique , 2006, IEEE Journal of Solid-State Circuits.

[17]  C.C. Chen,et al.  65nm CMOS high speed, general purpose and low power transistor technology for high volume foundry application , 2004, Digest of Technical Papers. 2004 Symposium on VLSI Technology, 2004..

[18]  Uming Ko,et al.  90nm low leakage SoC design techniques for wireless applications , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[19]  Bernie Mulgrew,et al.  IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems , 1998 .

[20]  Vivek De,et al.  Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[21]  Amitava Majumdar,et al.  Automatic generation and validation of memory test models for high performance microprocessors , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[22]  A. Chandrakasan,et al.  Analyzing static noise margin for sub-threshold SRAM in 65nm CMOS , 2005, Proceedings of the 31st European Solid-State Circuits Conference, 2005. ESSCIRC 2005..

[23]  Anna W. Topol,et al.  Stable SRAM cell design for the 32 nm node and beyond , 2005, Digest of Technical Papers. 2005 Symposium on VLSI Technology, 2005..

[24]  Belliappa Kuttanna,et al.  A Sub-1W to 2W Low-Power IA Processor for Mobile Internet Devices and Ultra-Mobile PCs in 45nm Hi-Κ Metal Gate CMOS , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[25]  Naveen Verma,et al.  A 65nm 8T Sub-Vt SRAM Employing Sense-Amplifier Redundancy , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[26]  K. Ishibashi,et al.  A 65-nm SoC Embedded 6T-SRAM Designed for Manufacturability With Read and Write Operation Stabilizing Circuits , 2007, IEEE Journal of Solid-State Circuits.

[27]  E. Seevinck,et al.  Static-noise margin analysis of MOS SRAM cells , 1987 .

[28]  R. Rosner,et al.  SRAM Redundancy - Silicon Area versus Number of Repairs Trade-off , 2008, 2008 IEEE/SEMI Advanced Semiconductor Manufacturing Conference.

[29]  Kaushik Roy,et al.  Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[30]  Michael Zhang,et al.  Highly-Associative Caches for Low-Power Processors , 2000 .

[31]  Richard B. Brown,et al.  Analysis and optimization of enhanced MTCMOS scheme , 2004, 17th International Conference on VLSI Design. Proceedings..

[32]  A. R. Newton,et al.  Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas , 1990 .

[33]  Jacob A. Abraham,et al.  Cache Design for Low Power and High Yield , 2008, 9th International Symposium on Quality Electronic Design (isqed 2008).

[34]  Massoud Pedram,et al.  Leakage Minimization of SRAM Cells in a Dual-Vt and Dual-Tox Technology , 2008, IEEE Trans. Very Large Scale Integr. Syst..

[35]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[36]  N. Vallepalli,et al.  A 3-GHz 70-mb SRAM in 65-nm CMOS technology with integrated column-based dynamic power supply , 2005, IEEE Journal of Solid-State Circuits.

[37]  Kaushik Roy,et al.  A CMOS thermal sensor and its applications in temperature adaptive design , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[38]  David J. Frank,et al.  Power-constrained CMOS scaling limits , 2002, IBM J. Res. Dev..

[39]  K. Takeda,et al.  A read-static-noise-margin-free SRAM cell for low-V/sub dd/ and high-speed applications , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[40]  J.G. Massey,et al.  NBTI: what we know and what we need to know - a tutorial addressing the current understanding and challenges for the future , 2004, IEEE International Integrated Reliability Workshop Final Report, 2004.

[41]  Maurice V. Wilkes,et al.  The memory gap and the future of high performance memories , 2001, CARN.

[42]  Vivek De,et al.  Effectiveness of reverse body bias for leakage control in scaled dual Vt CMOS ICs , 2001, ISLPED '01.

[43]  H. Kawaguchi,et al.  Low-power high-speed level shifter design for block-level dynamic voltage scaling environment , 2005, 2005 International Conference on Integrated Circuit Design and Technology, 2005. ICICDT 2005..

[44]  Sandip Kundu GateMaker: a transistor to gate level model extractor for simulation, automatic test pattern generation and verification , 1998, Proceedings International Test Conference 1998 (IEEE Cat. No.98CH36270).

[45]  Jacob A. Abraham,et al.  Cache Organization for Embeded Processors: CAM-vs-SRAM , 2006, 2006 IEEE International SOC Conference.

[46]  J. Meindl,et al.  The impact of intrinsic device fluctuations on CMOS SRAM cell stability , 2001, IEEE J. Solid State Circuits.

[47]  G. Ono,et al.  A 1000-MIPS/W microprocessor using speed adaptive threshold-voltage CMOS with forward bias , 2000, 2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.00CH37056).

[48]  Koji Nii,et al.  A 45nm Low-Standby-Power Embedded SRAM with Improved Immunity Against Process and Temperature Variations , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[49]  Niraj K. Jha,et al.  Input space adaptive design: a high-level methodology for optimizing energy and performance , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[50]  L.T. Clark,et al.  A 1.5 GHz 90 nm embedded microprocessor core , 2005, Digest of Technical Papers. 2005 Symposium on VLSI Circuits, 2005..

[51]  N. Vallepalli,et al.  SRAM design on 65-nm CMOS technology with dynamic sleep transistor for leakage reduction , 2005, IEEE Journal of Solid-State Circuits.

[52]  A. Olbrich,et al.  Maximization of Good Chips Per Wafer by Optimization of Memory Redundancy , 2007, IEEE Transactions on Semiconductor Manufacturing.

[53]  Masahiro Nomura,et al.  A read-static-noise-margin-free SRAM cell for low-VDD and high-speed applications , 2006, IEEE Journal of Solid-State Circuits.

[54]  K. Pagiamtzis,et al.  Content-addressable memory (CAM) circuits and architectures: a tutorial and survey , 2006, IEEE Journal of Solid-State Circuits.

[55]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[56]  A.P. Chandrakasan,et al.  Dual-threshold voltage techniques for low-power digital circuits , 2000, IEEE Journal of Solid-State Circuits.

[57]  S. Burns,et al.  An SRAM Design in 65nm and 45nm Technology Nodes Featuring Read and Write-Assist Circuits to Expand Operating Voltage , 2006, 2006 Symposium on VLSI Circuits, 2006. Digest of Technical Papers..

[58]  Ishiuchi,et al.  Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas , 2004 .

[59]  M. Sherony,et al.  65nm cmos technology for low power applications , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[60]  Saibal Mukhopadhyay,et al.  Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits , 2003, Proc. IEEE.

[61]  K. Roy,et al.  Design of a Process Variation Tolerant Self-Repairing SRAM for Yield Enhancement in Nanoscaled CMOS , 2007, IEEE Journal of Solid-State Circuits.