Cache designs for reliable hybrid high and ultra-low voltage operation

Increasing demand for implementing highly-miniaturized battery-powered ultra-low-cost systems (e.g., below 1 USD) in emerging applications such as body, urban life and environment monitoring, etc., has introduced many challenges in the chip design. Such applications require high performance occasionally, but very little energy consumption during most of the time in order to extend battery lifetime. In addition, they require real-time guarantees. The most suitable technological solution for those devices consists of using hybrid processors able to operate at: (i) high voltage to provide high performance and (ii) near-/sub-threshold (NST) voltage to provide ultra-low energy consumption. However, the most efficient SRAM memories for each voltage level differ and it is mandatory trading off different SRAM designs, especially in cache memories, which occupy most of the processor?s area. In this Thesis, we analyze the performance/power tradeoffs involved in the design of SRAM L1 caches for reliable hybrid high and NST Vcc operation from a microarchitectural perspective. We develop new, simple, single-Vcc domain hybrid cache architectures and data management mechanisms that satisfy all stringent needs of our target market. Proposed solutions are shown to have high energy efficiency with negligible impact on average performance while maintaining strong performance guarantees as required for our target market.

[1]  Yu Cao,et al.  New generation of predictive technology model for sub-45nm design exploration , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[2]  A.P. Chandrakasan,et al.  Static noise margin variation for sub-threshold SRAM in 65-nm CMOS , 2006, IEEE Journal of Solid-State Circuits.

[3]  Kanad Ghose,et al.  Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[4]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[5]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[6]  Frank Vahid,et al.  A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[7]  Vivek De,et al.  Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[8]  Anantha Chandrakasan,et al.  Full-chip sub-threshold leakage power prediction model for sub-0.18 μm CMOS , 2002, ISLPED '02.

[9]  Jason Liu,et al.  A High-Density Subthreshold SRAM with Data-Independent Bitline Leakage and Virtual Ground Replica Scheme , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[10]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[11]  James D. Meindl,et al.  Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration , 2002, IEEE J. Solid State Circuits.

[12]  Ling Guan,et al.  Optimal Resource Allocation for Pervasive Health Monitoring Systems with Body Sensor Networks , 2011, IEEE Transactions on Mobile Computing.

[13]  Saurabh Dighe,et al.  A 280mV-to-1.2V wide-operating-range IA-32 processor in 32nm CMOS , 2012, 2012 IEEE International Solid-State Circuits Conference.

[14]  Mateo Valero,et al.  Analyzing the Efficiency of L1 Caches for Reliable Hybrid-Voltage Operation Using EDC Codes , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  Pankaj Agarwal,et al.  A low leakage and SNM free SRAM cell design in deep sub micron CMOS technology , 2006, 19th International Conference on VLSI Design held jointly with 5th International Conference on Embedded Systems Design (VLSID'06).

[16]  R.H. Dennard,et al.  An 8T-SRAM for Variability Tolerance and Low-Voltage Operation in High-Performance Caches , 2008, IEEE Journal of Solid-State Circuits.

[17]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[18]  R. Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[19]  David Blaauw,et al.  Reconfigurable energy efficient near threshold cache architectures , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[20]  Anantha Chandrakasan,et al.  Characterizing and modeling minimum energy operation for subthreshold circuits , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[21]  Yuan Taur,et al.  Fundamentals of Modern VLSI Devices , 1998 .

[22]  W. Huott,et al.  6.6+ GHz Low Vmin, read and half select disturb-free 1.2 Mb SRAM , 2007, 2007 IEEE Symposium on VLSI Circuits.

[23]  Sami Yehia,et al.  The next convergence: High-performance and mission-critical markets , 2013 .

[24]  Francisco J. Cazorla,et al.  Hybrid high-performance low-power and ultra-low energy reliable caches , 2011, CF '11.

[25]  Ming Zhang,et al.  Combinational Logic Soft Error Correction , 2006, 2006 IEEE International Test Conference.

[26]  Nam Sung Kim,et al.  Low-voltage on-chip cache architecture using heterogeneous cell sizes for high-performance processors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[27]  Y. Nakagome,et al.  Trends in low-power RAM circuit technologies , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[28]  Kaushik Roy,et al.  A process-tolerant cache architecture for improved yield in nanoscale technologies , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[29]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[30]  Francisco J. Cazorla,et al.  DTM: Degraded Test Mode for Fault-Aware Probabilistic Timing Analysis , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[31]  Avesta Sasan,et al.  A fault tolerant cache architecture for sub 500mV operation: resizable data composer cache (RDC-cache) , 2009, CASES '09.

[32]  Kaushik Roy,et al.  A 160 mV, fully differential, robust schmitt trigger based sub-threshold SRAM , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[33]  Francisco J. Cazorla,et al.  RVC-based time-predictable faulty caches for safety-critical systems , 2011, 2011 IEEE 17th International On-Line Testing Symposium.

[34]  Kevin Skadron,et al.  HotLeakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects , 2003 .

[35]  Kaushik Roy,et al.  Reducing set-associative cache energy via way-prediction and selective direct-mapping , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[36]  Tadahiro Kuroda,et al.  Variable supply-voltage scheme for low-power high-speed CMOS digital design , 1998, IEEE J. Solid State Circuits.

[37]  Alaa R. Alameldeen,et al.  Trading off Cache Capacity for Reliability to Enable Low Voltage Operation , 2008, 2008 International Symposium on Computer Architecture.

[38]  Wei Wu,et al.  Improving cache lifetime reliability at ultra-low voltages , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[39]  Trevor Mudge,et al.  On-Chip Cache Device Scaling Limits and Effective Fault Repair Techniques in Future Nanoscale Technology , 2007 .

[40]  Mateo Valero,et al.  APPLE: Adaptive Performance-Predictable Low-Energy caches for reliable hybrid voltage operation , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[41]  Narayanan Vijaykrishnan,et al.  A comparative study of power efficient SRAM designs , 2000, ACM Great Lakes Symposium on VLSI.

[42]  N. Muralimanohar,et al.  CACTI 6 . 0 : A Tool to Understand Large Caches , 2007 .

[43]  Francisco J. Cazorla,et al.  The MPsim simulation tool , 2009 .

[44]  Antonio María González Colás,et al.  Low Vccmin fault-tolerant cache with highly predictable performance , 2009, MICRO 2009.

[45]  Zhiyi Yu,et al.  A 167-Processor Computational Platform in 65 nm CMOS , 2009, IEEE Journal of Solid-State Circuits.

[46]  Alvin M. Despain,et al.  Cache design trade-offs for power and performance optimization: a case study , 1995, ISLPED '95.

[47]  David E. Culler,et al.  Lessons from a Sensor Network Expedition , 2004, EWSN.

[48]  Ikuya Kawasaki,et al.  SH3: high code density, low power , 1995, IEEE Micro.

[49]  Matt Welsh,et al.  Deploying a wireless sensor network on an active volcano , 2006, IEEE Internet Computing.

[50]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[51]  L.T. Clark,et al.  An Ultra-Low-Power Memory With a Subthreshold Power Supply Voltage , 2006, IEEE Journal of Solid-State Circuits.

[52]  Jaume Abella,et al.  Power efficient data cache designs , 2003, Proceedings 21st International Conference on Computer Design.

[53]  Toshinori Sato,et al.  Non-uniform Set-Associative Caches for Power-Aware Embedded Processors , 2004, EUC.

[54]  Zhiyu Liu,et al.  High Read Stability and Low Leakage Cache Memory Cell , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[55]  Hai Zhou,et al.  Yield-Aware Cache Architectures , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[56]  Chin-Long Chen,et al.  Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1984, IBM J. Res. Dev..

[57]  Jung Ho Ahn,et al.  Matching cache access behavior and bit error pattern for high performance low Vcc L1 cache , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[58]  Nora Goldschlager,et al.  Cardiac pacing for the clinician , 2008 .

[59]  David Blaauw,et al.  CAS-FEST 2010: Mitigating Variability in Near-Threshold Computing , 2011, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[60]  Amin Ansari,et al.  ZerehCache: Armoring cache architectures in high defect density technologies , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[61]  Reinhard Wilhelm,et al.  Fast and Efficient Cache Behavior Prediction , 1997 .

[62]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[63]  H. J. Arnold Introduction to the Practice of Statistics , 1990 .

[64]  A.P. Chandrakasan,et al.  A 256 kb 65 nm 8T Subthreshold SRAM Employing Sense-Amplifier Redundancy , 2008, IEEE Journal of Solid-State Circuits.

[65]  Akhil Garg,et al.  Fuse Area Reduction Based on Quantitative Yield Analysis and Effective Chip Cost , 2006, 2006 IEEE International SOC Conference.

[66]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[67]  G. Gerosa A sub 2W low power IA processor for Mobile Internet Devices in 45nm Hi-K metal gate CMOS , 2009 .

[68]  T. Mudge,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[69]  Xiaoxia Wu,et al.  Hybrid cache architecture with disparate memory technologies , 2009, ISCA '09.

[70]  David Blaauw,et al.  Energy Optimality and Variability in Subthreshold Design , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.

[71]  H. Fujiwara,et al.  An Area-Conscious Low-Voltage-Oriented 8T-SRAM Design under DVS Environment , 2007, 2007 IEEE Symposium on VLSI Circuits.

[72]  Michael L. Scott,et al.  Integrating adaptive on-chip storage structures for reduced dynamic power , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[73]  Amin Ansari,et al.  Archipelago: A polymorphic cache design for enabling robust near-threshold operation , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[74]  Mateo Valero,et al.  Efficient cache architectures for reliable hybrid voltage operation using EDC codes , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[75]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[76]  David Blaauw,et al.  Energy-Efficient Subthreshold Processor Design , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[77]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[78]  Yiannakis Sazeides,et al.  The Performance Vulnerability of Architectural and Non-architectural Arrays to Permanent Faults , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[79]  David Blaauw,et al.  Yield-Driven Near-Threshold SRAM Design , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[80]  Yiannakis Sazeides,et al.  Performance-effective operation below Vcc-min , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[81]  A.P. Chandrakasan,et al.  A 256-kb 65-nm Sub-threshold SRAM Design for Ultra-Low-Voltage Operation , 2007, IEEE Journal of Solid-State Circuits.

[82]  Mateo Valero,et al.  ADAM: an efficient data management mechanism for hybrid high and ultra-low voltage operation caches , 2012, GLSVLSI '12.