Thermal-aware lifetime reliability in multicore systems

As the power density of modern electronic circuits increases dramatically, systems are prone to overheating. High temperatures not only raise packaging costs, degrade system performance, and increase leakage power consumption, but also reduce the system reliability. Due to many limits in single core design including the performance and the power density, the microprocessor industry has switched their attentions to multicore design to enable the scaling of performance. Thermal effects on multicore systems are still prominent issues. One typical thermal effect is the thermal-aware lifetime reliability, which has become a serious concern. In this paper, we address the issue on how to maximize the lifetime of multicore systems while maintaining a given aggregate processor speed. By applying sequential quadratic programming, we present how to derive the ideal speed for each core to maximize the system lifetime. We perform experiments on several multi-core platforms, which show that the proposed method can significantly outperform the existing approaches by minimizing the peak temperature of the system.

[1]  Tajana Simunic,et al.  Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors , 2009, SIGMETRICS '09.

[2]  Qiang Xu,et al.  On Modeling the Lifetime Reliability of Homogeneous Manycore Systems , 2008, 2008 14th IEEE Pacific Rim International Symposium on Dependable Computing.

[3]  Daniel Mossé,et al.  Energy-efficient policies for embedded clusters , 2005, LCTES '05.

[4]  Stephen P. Boyd,et al.  Temperature-aware processor frequency assignment for MPSoCs using convex optimization , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[5]  Yusuf Leblebici,et al.  Analysis and Optimization of MPSoC Reliability , 2006, J. Low Power Electron..

[6]  Mathukumalli Vidyasagar,et al.  New algorithms for constrained minimax optimization , 1977, Math. Program..

[7]  Sherief Reda,et al.  Frequency planning for multi-core processors under thermal constraints , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).

[8]  Giovanni De Micheli,et al.  Power and Reliability Management of SoCs , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Lothar Thiele,et al.  Proactive Speed Scheduling for Real-Time Tasks under Thermal Constraints , 2009, 2009 15th IEEE Real-Time and Embedded Technology and Applications Symposium.

[10]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[11]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[12]  Xiaobo Sharon Hu,et al.  Temperature-Aware Scheduling and Assignment for Hard Real-Time Applications on MPSoCs , 2011, IEEE Trans. Very Large Scale Integr. Syst..

[13]  Qiang Xu,et al.  Lifetime reliability-aware task allocation and scheduling for MPSoC platforms , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[14]  J. Black Mass transport of aluminum by momentum exchange with conducting electrons , 1967, 2005 IEEE International Reliability Physics Symposium, 2005. Proceedings. 43rd Annual..

[15]  Huazhong Yang,et al.  Accurate temperature-dependent integrated circuit leakage power estimation is easy , 2007 .

[16]  Kevin Skadron,et al.  Interconnect Lifetime Prediction for Reliability-Aware Systems , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[17]  Li Shang,et al.  Accurate Temperature-Dependent Integrated Circuit Leakage Power Estimation is Easy , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[18]  Margaret Martonosi,et al.  Dynamic thermal management for high-performance microprocessors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[19]  Tei-Wei Kuo,et al.  On the Minimization fo the Instantaneous Temperature for Periodic Real-Time Tasks , 2007, 13th IEEE Real Time and Embedded Technology and Applications Symposium (RTAS'07).

[20]  Michael C. Huang,et al.  A framework for dynamic energy efficiency and temperature management , 2000, MICRO 33.

[21]  Rajesh K. Gupta,et al.  Leakage aware dynamic voltage scaling for real-time embedded systems , 2004, Proceedings. 41st Design Automation Conference, 2004..

[22]  Tei-Wei Kuo,et al.  On the Minimization of the Instantaneous Temperature for Periodic Real-Time Tasks ∗ , 2007 .

[23]  Lei He,et al.  Temperature and supply Voltage aware performance and power modeling at microarchitecture level , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[24]  Kevin Skadron,et al.  HotSpot: a dynamic compact thermal model at the processor-architecture level , 2003, Microelectron. J..

[25]  Mahmut T. Kandemir,et al.  Thermal-aware task allocation and scheduling for embedded systems , 2005, Design, Automation and Test in Europe.

[26]  Pradip Bose,et al.  Lifetime Reliability Awareness for Microprocessors , 2004 .

[27]  Kevin Skadron,et al.  Improved thermal management with reliability banking , 2005, IEEE Micro.

[28]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[29]  Pradip Bose,et al.  A Framework for Architecture-Level Lifetime Reliability Modeling , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[30]  Seung-Moon Yoo,et al.  A framework for dynamic energy efficiency and temperature management , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[31]  Luca Benini,et al.  Temperature Control of High-Performance Multi-core Platforms Using Convex Optimization , 2008, 2008 Design, Automation and Test in Europe.