Thermal management for dependable on-chip systems

Dependability has become a growing concern in the nano-CMOS era due to elevated temperatures and an increased susceptibility to temperature of the small structures. We present an overview of temperature-related effects that threaten dependability and a methodology for reducing the dependability concerns through thermal management utilizing the concept of aging budgeting.

[1]  Gernot Heiser,et al.  Dynamic voltage and frequency scaling: the laws of diminishing returns , 2010 .

[2]  Wei Wang,et al.  On-Chip Aging Sensor Circuits for Reliable Nanometer MOSFET Digital Circuits , 2010, IEEE Transactions on Circuits and Systems II: Express Briefs.

[3]  Kiyoo Itoh,et al.  Supply voltage scaling for temperature insensitive CMOS circuit operation , 1998 .

[4]  M. Krstić,et al.  Real-Time Optimization by Extremum-Seeking Control , 2003 .

[5]  Yu Cao,et al.  Predictive Technology Model for Nano-CMOS Design Exploration , 2006, Nano-Net.

[6]  E. Cartier,et al.  PBTI relaxation dynamics after AC vs. DC stress in high-k/metal gate stacks , 2010, 2010 IEEE International Reliability Physics Symposium.

[7]  Luca Benini,et al.  SCC thermal model identification via advanced bias-compensated least-squares , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[8]  Kevin Skadron,et al.  Temperature-aware microarchitecture: Modeling and implementation , 2004, TACO.

[9]  U. Hansen,et al.  Hydrogen passivation of silicon surfaces: A classical molecular-dynamics study , 1998 .

[10]  P. Gronowski,et al.  Design of an 8-wide superscalar RISC microprocessor with simultaneous multithreading , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[11]  Jason Cong,et al.  A thermal-driven floorplanning algorithm for 3D ICs , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[12]  Tajana Simunic,et al.  Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors , 2009, SIGMETRICS '09.

[13]  Pradip Bose,et al.  Multicore power management: Ensuring robustness via early-stage formal verification , 2009, 2009 7th IEEE/ACM International Conference on Formal Methods and Models for Co-Design.

[14]  Narayanan Vijaykrishnan,et al.  Reliability concerns in embedded system designs , 2006, Computer.

[15]  Massoud Pedram,et al.  Active bank switching for temperature control of the register file in a microprocessor , 2007, GLSVLSI '07.

[16]  Jörg Henkel,et al.  Economic learning for thermal-aware power budgeting in many-core architectures , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[17]  David Atienza,et al.  GreenCool: An Energy-Efficient Liquid Cooling Design Technique for 3-D MPSoCs Via Channel Width Modulation , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  David Atienza,et al.  Temperature-Aware Design and Management for 3D Multi-Core Architectures , 2014, Found. Trends Electron. Des. Autom..

[19]  Jörg Henkel,et al.  ADAM: Run-time agent-based distributed application mapping for on-chip communication , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[20]  Radu Marculescu,et al.  Voltage-Frequency Island Partitioning for GALS-based Networks-on-Chip , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[21]  Hannu Tenhunen,et al.  Novel Agent-Based Management for Fault-Tolerance in Network-on-Chip , 2007 .

[22]  Ishiuchi,et al.  Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas , 2004 .

[23]  Narayanan Vijaykrishnan,et al.  Thermal trends in emerging technologies , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[24]  Kevin Skadron,et al.  Compact thermal modeling for temperature-aware design , 2004, Proceedings. 41st Design Automation Conference, 2004..

[25]  Yu Cao,et al.  Physics matters: Statistical aging prediction under trapping/detrapping , 2012, DAC Design Automation Conference 2012.

[26]  Thomas A. DeMassa,et al.  Unified model for Q BD prediction for thin gate oxide MOS devices with constant voltage and current stress , 2000 .

[27]  J. A. Schwarz Effect of temperature on the variance of the log‐normal distribution of failure times due to electromigration damage , 1987 .

[28]  Jörg Henkel,et al.  COOL: control-based optimization of load-balancing for thermal behavior , 2012, CODES+ISSS '12.

[29]  E. Rotem,et al.  Temperature measurement in the Intel(R) CoreTM Duo Processor , 2006 .

[30]  Stephen A. Jarvis,et al.  Grid load balancing using intelligent agents , 2005, Future Gener. Comput. Syst..

[31]  Krste Asanovic,et al.  Reducing power density through activity migration , 2003, ISLPED '03.

[32]  Bala Shetty,et al.  The nonlinear knapsack problem - algorithms and applications , 2002, Eur. J. Oper. Res..

[33]  Tajana Rosing,et al.  Temperature aware task scheduling in MPSoCs , 2007 .

[34]  C. Hu,et al.  Hole injection SiO/sub 2/ breakdown model for very low voltage lifetime extrapolation , 1994 .

[35]  Sandeep K. Shukla,et al.  Using probabilistic model checking for dynamic power management , 2005, Formal Aspects of Computing.

[36]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[37]  Eun Jung Kim,et al.  Predictive dynamic thermal management for multicore systems , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[38]  Luca Benini,et al.  Dynamic Thermal Clock Skew Compensation Using Tunable Delay Buffers , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[39]  Tajana Simunic,et al.  Temperature-aware MPSoC scheduling for reducing hot spots and gradients , 2008, 2008 Asia and South Pacific Design Automation Conference.

[40]  Jörg Henkel,et al.  Self-Immunity Technique to Improve Register File Integrity Against Soft Errors , 2011, 2011 24th Internatioal Conference on VLSI Design.

[41]  Resve A. Saleh,et al.  Power Supply Noise in SoCs: Metrics, Management, and Measurement , 2007, IEEE Design & Test of Computers.

[42]  Muhammad Ashraful Alam,et al.  A comprehensive model of PMOS NBTI degradation , 2005, Microelectron. Reliab..

[43]  Ching-Te Chuang,et al.  Impacts of NBTI and PBTI on SRAM static/dynamic noise margins and cell failure probability , 2009, Microelectron. Reliab..

[44]  C. Dixon,et al.  Controlling the Mobility of Network Nodes using Decentralized Extremum Seeking , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[45]  Wolfgang Karl,et al.  Evaluation of Adaptive Memory Management Techniques on the Tilera TILE-Gx Platform , 2014, ARCS Workshops.

[46]  S. Deora,et al.  A critical re-evaluation of the usefulness of R-D framework in predicting NBTI stress and recovery , 2011, 2011 International Reliability Physics Symposium.

[47]  X. Ji,et al.  Physical understanding of hot carrier injection variability in deeply scaled nMOSFETs , 2014 .

[48]  Muhammad Shafique,et al.  Formal verification of distributed dynamic thermal management , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[49]  Sandeep Koranne VLSI CAD Tools , 2011 .

[50]  Paul Ampadu,et al.  Temperature Effects in Semiconductors , 2012 .

[51]  Alex Orailoglu,et al.  Processor reliability enhancement through compiler-directed register file peak temperature reduction , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[52]  Tajana Simunic,et al.  Utilizing Predictors for Efficient Thermal Management in Multiprocessor SoCs , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[53]  S. Mahapatra,et al.  A consistent physical framework for N and P BTI in HKMG MOSFETs , 2012, 2012 IEEE International Reliability Physics Symposium (IRPS).

[54]  Mahmut T. Kandemir,et al.  Increasing register file immunity to transient errors , 2005, Design, Automation and Test in Europe.

[55]  Tajana Simunic,et al.  Temperature management in multiprocessor SoCs using online learning , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[56]  Y. Çengel,et al.  Thermodynamics : An Engineering Approach , 1989 .

[57]  Jun Yang,et al.  Thermal Management for 3D Processors via Task Scheduling , 2008, 2008 37th International Conference on Parallel Processing.

[58]  Li Shang,et al.  Three-Dimensional Chip-Multiprocessor Run-Time Thermal Management , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[59]  D. Schmitt-Landsiedel,et al.  The impact of intra-die device parameter variations on path delays and on the design for yield of low voltage digital circuits , 1996, Proceedings of 1996 International Symposium on Low Power Electronics and Design.

[60]  A. R. Newton,et al.  Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas , 1990 .

[61]  Petr Jan Horn,et al.  Autonomic Computing: IBM's Perspective on the State of Information Technology , 2001 .

[62]  Nikil D. Dutt,et al.  RELOCATE: Register File Local Access Pattern Redistribution Mechanism for Power and Thermal Management in Out-of-Order Embedded Processor , 2010, HiPEAC.

[63]  Gabriel H. Loh,et al.  Thermal analysis of a 3D die-stacked high-performance microprocessor , 2006, GLSVLSI '06.

[64]  Luca Benini,et al.  Dynamic Thermal Clock Skew Compensation using Tunable Delay Buffers , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.

[65]  Jason Cong,et al.  Thermal via planning for 3-D ICs , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[66]  Jörg Henkel,et al.  TAPE: thermal-aware agent-based power economy for multi/many-core architectures , 2009, ICCAD '09.

[67]  Jörg Henkel,et al.  Stress balancing to mitigate NBTI effects in register files , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[68]  Jian-Jia Chen,et al.  Thermal-aware lifetime reliability in multicore systems , 2010, 2010 11th International Symposium on Quality Electronic Design (ISQED).

[69]  Sachin S. Sapatnekar,et al.  Impact of NBTI on SRAM read stability and design for reliability , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[70]  Shobha Vasudevan,et al.  Verifying dynamic power management schemes using statistical model checking , 2012, 17th Asia and South Pacific Design Automation Conference.

[71]  AtienzaDavid,et al.  Temperature-Aware Design and Management for 3D Multi-Core Architectures , 2014 .

[72]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[73]  Luca Benini,et al.  Thermal Balancing Policy for Streaming Computing on Multiprocessor Architectures , 2008, 2008 Design, Automation and Test in Europe.

[74]  David Atienza,et al.  Modeling and dynamic management of 3D multicore systems with liquid cooling , 2009, 2009 17th IFIP International Conference on Very Large Scale Integration (VLSI-SoC).

[75]  Luca Benini,et al.  Quantifying the impact of frequency scaling on the energy efficiency of the single-chip cloud computer , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[76]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[77]  J. Black,et al.  Electromigration—A brief survey and some recent results , 1969 .

[78]  Muhammad Shafique,et al.  The EDA challenges in the dark silicon era , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[79]  Augustus K. Uht,et al.  Central vs. distributed dynamic thermal management for multi-core processors: which one is better? , 2009, GLSVLSI '09.

[80]  Yusuf Leblebici,et al.  Dynamic thermal management in 3D multicore architectures , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[81]  Sandeep K. Shukla,et al.  A model checking approach to evaluating system level dynamic power management policies for embedded systems , 2001, Sixth IEEE International High-Level Design Validation and Test Workshop.

[82]  Gerard J. Holzmann,et al.  The Model Checker SPIN , 1997, IEEE Trans. Software Eng..

[83]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[84]  Coniferous softwood GENERAL TERMS , 2003 .

[85]  D. Schroder,et al.  Negative bias temperature instability: Road to cross in deep submicron silicon semiconductor manufacturing , 2003 .

[86]  Doris Schmitt-Landsiedel,et al.  The impact of intra-die device parameter variations on path delays and on the design for yield of low voltage digital circuits , 1996, ISLPED '96.

[87]  C. Hu,et al.  BSIM4 gate leakage model including source-drain partition , 2000, International Electron Devices Meeting 2000. Technical Digest. IEDM (Cat. No.00CH37138).

[88]  Qiang Xu,et al.  Lifetime reliability-aware task allocation and scheduling for MPSoC platforms , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[89]  Leigh Tesfatsion,et al.  Agent-Based Computational Economics: Growing Economies From the Bottom Up , 2002, Artificial Life.

[90]  Abdulazim Amouri,et al.  Accurate Thermal-Profile Estimation and Validation for FPGA-Mapped Circuits , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[91]  Diana Marculescu,et al.  Analysis of dynamic voltage/frequency scaling in chip-multiprocessors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[92]  Luca P. Carloni,et al.  Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis , 2012 .

[93]  Trevor Mudge,et al.  Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessors under dynamic workloads , 2002, ICCAD 2002.

[94]  Jun Yang,et al.  Thermal-Aware Task Scheduling for 3D Multicore Processors , 2010, IEEE Transactions on Parallel and Distributed Systems.

[95]  Paul Ampadu,et al.  A Sensor to Detect Normal or Reverse Temperature Dependence in Nanoscale CMOS Circuits , 2009, 2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[96]  Chenming Hu,et al.  Electrical breakdown in thin gate and tunneling oxides , 1985, IEEE Transactions on Electron Devices.

[97]  Joseph P. Hornak,et al.  Encyclopedia of imaging science and technology , 2002 .

[98]  M. White Microelectronics reliability : physics-of-failure based modeling and lifetime evaluation , 2008 .

[99]  J. Bisschop,et al.  Effect of thermal gradients on the electromigration life-time in power electronics , 2004, 2004 IEEE International Reliability Physics Symposium. Proceedings.

[100]  Jörg Henkel,et al.  Agent-Based Thermal Management Using Real-Time I/O Communication Relocation for 3D Many-Cores , 2011, PATMOS.

[101]  Chu Shik Jhon,et al.  Register-relocation: a thermal-aware renaming method for reducing temperature of a register file , 2010, SIAP.

[102]  Nicholas R. Jennings,et al.  A Roadmap of Agent Research and Development , 2004, Autonomous Agents and Multi-Agent Systems.

[103]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[104]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[105]  Engin Ipek,et al.  Dynamic Multicore Resource Management: A Machine Learning Approach , 2009, IEEE Micro.

[106]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[107]  Kevin Skadron,et al.  An Improved Block-Based Thermal Model in HotSpot 4.0 with Granularity Considerations , 2007 .

[108]  Pradip Bose,et al.  The case for lifetime reliability-aware microprocessors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[109]  J. Meindl,et al.  The impact of intrinsic device fluctuations on CMOS SRAM cell stability , 2001, IEEE J. Solid State Circuits.

[110]  Sherief Reda Thermal and Power Characterization of Real Computing Devices , 2011, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[111]  F. Disalvo,et al.  Thermoelectric cooling and power generation , 1999, Science.

[112]  Heba Khdr,et al.  mDTM: Multi-objective dynamic thermal management for on-chip systems , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[113]  Ming Zhang,et al.  Circuit Failure Prediction and Its Application to Transistor Aging , 2007, 25th IEEE VLSI Test Symposium (VTS'07).

[114]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[115]  David E. Leasure,et al.  List of Figures , 1999 .