Emulation of an ASIC power and temperature monitoring system (eTPMon) for FPGA prototyping

Abstract Hardware monitoring information can be used during system runtime to increase system lifetime and reliability. Examples of such monitoring information are power, temperature, and the aging status of processors. They provide the system with relevant information about the current hardware health. Such information is especially crucial in resource-aware computing concepts that introduce self-organizing behavior to deal with large MPSoCs (Multi-Processor Systems-on-Chip): For resource-aware computing, resources are allocated according to the current requirements. To find suitable resource-application pairs and achieve system targets like optimizing the utilization, current hardware status must be considered during resource allocation. To evaluate and optimize resource allocation strategies during the design phase, FPGA prototyping is often required before its implementation in ASIC. The evolution of power, temperature and aging differ between ASIC implementation and FPGA prototype. The FPGA prototype should react on sensor data characterized from the target ASIC design instead of FPGA’s hardware status. This paper describes the design of an emulated ASIC Temperature and Power Monitoring system (eTPMon) for FPGA-based prototyping. The emulation approach for power monitors is based on an instruction-level energy model. For emulating temperature monitors, a thermal RC model is used. eTPMon can supply MPSoC prototypes with the hardware status information (power and temperature of the cores) needed for efficient load distribution, achieving resource-aware computing targets. Based on the eTPMon data, different operating strategies and control targets were evaluated for a 2-tile resource-aware MPSoC system. Values provided by eTPMon are usable for extracting information about the aging of processors, which can be used for increasing the system lifetime.

[1]  Sanjay Ranka,et al.  A simple thermal model for multi-core processors and its application to slack allocation , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[2]  O. Semenov,et al.  Impact of self-heating effect on long-term reliability and performance degradation in CMOS circuits , 2006, IEEE Transactions on Device and Materials Reliability.

[3]  Jürgen Teich,et al.  Invasive Computing: An Overview , 2011, Multiprocessor System-on-Chip.

[4]  T. Grasser,et al.  The time dependent defect spectroscopy (TDDS) for the characterization of the bias temperature instability , 2010, 2010 IEEE International Reliability Physics Symposium.

[5]  Sandro Penolazzi,et al.  A System-Level Framework for Energy and Performance Estimation in System-on-Chip Architectures , 2011 .

[6]  Doris Schmitt-Landsiedel,et al.  Reliability monitoring of digital circuits by in situ timing measurement , 2013, 2013 23rd International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[7]  Chenyue Ma,et al.  Accurate description of temperature accelerated NBTI effect using the universal prediction model , 2015, 2015 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC).

[8]  Doris Schmitt-Landsiedel,et al.  A method to analyze the impact of fast-recovering NBTI degradation on the stability of large-scale SRAM arrays , 2011 .

[9]  Bin Zhang Online circuit reliability monitoring , 2009, GLSVLSI '09.

[10]  Ben Klass Modeling Inter-Instruction Energy Effects in a Digital Signal Processor , 2006 .

[11]  Ulf Schlichtmann,et al.  Workload- and instruction-aware timing analysis - The missing link between technology and system-level resilience , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Doris Schmitt-Landsiedel,et al.  From an analytic NBTI device model to reliability assessment of complex digital circuits , 2014, 2014 IEEE 20th International On-Line Testing Symposium (IOLTS).

[13]  Stephen H. Gunther,et al.  Managing the Impact of Increasing Microprocessor Power Consumption , 2001 .

[14]  Kevin Skadron,et al.  Compact thermal modeling for temperature-aware design , 2004, Proceedings. 41st Design Automation Conference, 2004..

[15]  Luca Benini,et al.  Thermal and Energy Management of High-Performance Multicores: Distributed and Self-Calibrating Model-Predictive Controller , 2013, IEEE Transactions on Parallel and Distributed Systems.

[16]  Luca Benini,et al.  A fast HW/SW FPGA-based thermal emulation framework for multi-processor system-on-chip , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[17]  Doris Schmitt-Landsiedel,et al.  Countermeasures against NBTI degradation on 6T-SRAM cells , 2011 .

[18]  Li Shang,et al.  Power, Thermal, and Reliability Modeling in Nanometer-Scale Microprocessors , 2007, IEEE Micro.

[19]  T. Grasser,et al.  Evidence That Two Tightly Coupled Mechanisms Are Responsible for Negative Bias Temperature Instability in Oxynitride MOSFETs , 2009, IEEE Transactions on Electron Devices.

[20]  Kevin Skadron,et al.  Temperature-aware microarchitecture: Modeling and implementation , 2004, TACO.

[21]  Robert F. Boehm,et al.  Conduction Heat Transfer , 2017 .

[22]  Andreas Herkersdorf,et al.  Hardware assisted thread assignment for RISC based MPSoCs in invasive computing , 2011, 2011 International Symposium on Integrated Circuits.

[23]  Dimitrios Soudris,et al.  A Systematic Methodology for Reliability Improvements on SoC-Based Software Defined Radio Systems , 2012, VLSI Design.

[24]  Doris Schmitt-Landsiedel,et al.  Modeling of temperature scenarios in a multicore processor system , 2013 .

[25]  Ulf Schlichtmann,et al.  Efficiently analyzing the impact of aging effects on large integrated circuits , 2012, Microelectron. Reliab..

[26]  Hao Shen,et al.  An FPGA-Based Distributed Computing System with Power and Thermal Management Capabilities , 2011, 2011 Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN).

[27]  T. Grasser,et al.  The statistical analysis of individual defects constituting NBTI and its implications for modeling DC- and AC-stress , 2010, 2010 IEEE International Reliability Physics Symposium.

[28]  Srivaths Ravi,et al.  Power emulation: a new paradigm for power estimation , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[29]  Margaret Martonosi,et al.  Full-system chip multiprocessor power evaluations using FPGA-based emulation , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).

[30]  Ulf Schlichtmann,et al.  A compact model for NBTI degradation and recovery under use-profile variations and its application to aging analysis of digital integrated circuits , 2014, Microelectron. Reliab..