Temperature aware task scheduling in MPSoCs

In deep submicron circuits, elevation in temperatures has brought new challenges in reliability, timing, performance, cooling costs and leakage power. Conventional thermal management techniques sacrifice performance to control the thermal behavior by slowing down or turning off the processors when a critical temperature threshold is exceeded. Moreover, studies have shown that in addition to high temperatures, temporal and spatial variations in temperature impact system reliability. In this work, we explore the benefits of thermally aware task scheduling for multiprocessor systems-on-a-chip (MPSoC). We design and evaluate OS-level dynamic scheduling policies with negligible performance overhead. We show that, using simple to implement policies that make decisions based on temperature measurements, better temporal and spatial thermal profiles can be achieved in comparison to state-of-art schedulers. We also enhance reactive strategies such as dynamic thread migration with our scheduling policies. This way, hot spots and temperature variations are decreased, and the performance cost is significantly reduced.

[1]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[2]  Enrico Macii,et al.  Thermal resilient bounded-skew clock tree optimization methodology , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[3]  Massoud Pedram,et al.  Power-aware scheduling and dynamic voltage setting for tasks running on a hard real-time system , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[4]  Radu Marculescu,et al.  Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[5]  Jinuk Luke Shin,et al.  A Power-Efficient High-Throughput 32-Thread SPARC Processor , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[6]  Kaustav Banerjee,et al.  Modeling and analysis of nonuniform substrate temperature effects on global ULSI interconnects , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Kaustav Banerjee,et al.  Analysis of substrate thermal gradient effects on optimal buffer insertion , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[8]  Stephen H. Gunther,et al.  Managing the Impact of Increasing Microprocessor Power Consumption , 2001 .

[9]  Sara Bouchenak,et al.  Pickling threads state in the Java system , 2000, Proceedings 33rd International Conference on Technology of Object-Oriented Languages and Systems TOOLS 33.

[10]  H. Kufluoglu,et al.  A Computational Model of NBTI and Hot Carrier Injection Time-Exponents for MOSFET Reliability , 2004 .

[11]  Mahmut T. Kandemir,et al.  Thermal-aware task allocation and scheduling for embedded systems , 2005, Design, Automation and Test in Europe.

[12]  Kevin Skadron,et al.  A Case for Thermal-Aware Floorplanning at the Microarchitectural Level , 2005, J. Instr. Level Parallelism.

[13]  T. N. Vijaykumar,et al.  Heat-and-run: leveraging SMT and CMP to manage power density through the operating system , 2004, ASPLOS XI.

[14]  Kevin Skadron,et al.  Hybrid architectural dynamic thermal management , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[15]  Giovanni De Micheli,et al.  Optimization of Reliability and Power Consumption in Systems on a Chip , 2005, PATMOS.

[16]  Pradip Bose,et al.  The case for lifetime reliability-aware microprocessors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[17]  Francesco Poletti,et al.  Communication-aware allocation and scheduling framework for stream-oriented multi-processor systems-on-chip , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[18]  R. Viswanath Thermal Performance Challenges from Silicon to Systems , 2000 .

[19]  Luca Benini,et al.  A fast HW/SW FPGA-based thermal emulation framework for multi-processor system-on-chip , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[20]  Rudy Lauwereins,et al.  Energy-Aware Runtime Scheduling for Embedded-Multiprocessor SOCs , 2001, IEEE Des. Test Comput..

[21]  Fadi J. Kurdahi,et al.  Power-aware scheduling under timing constraints for mission-critical embedded systems , 2001, DAC '01.

[22]  Luca Benini,et al.  A survey of design techniques for system-level dynamic power management , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[23]  Jinuk Luke Shin,et al.  A Power-Efficient High-Throughput 32-Thread , 2007 .

[24]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[25]  Clemens J. M. Lasance Thermally driven reliability issues in microelectronic systems: status-quo and challenges , 2003, Microelectron. Reliab..

[26]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[27]  Sarita V. Adve,et al.  Predictive dynamic thermal management for multimedia applications , 2003, ICS '03.