Reducing Overheating-Induced Failures Via Performance-Aware CPU Power Management ?

Cluster end-users and administrators have become more cog- nizant of the fact that large-scale commodity clusters fail quite fre- quently, and the main source of these failures is hardware (e.g., pro- cessors) with the primary cause being heat. This situation is expected to worsen with even larger-scale clusters powered by faster (and/or multi- core) processors. In this paper, we propose a power-management algo- rithm that addresses heat-related reliability for processors by control- ling their clock speeds in a performance-aware manner. This approach is complementary to existing approaches such as exotic cooling and fault- tolerant technologies in that it proactively deals with power and cooling issues before they become a problem. Our preliminary experimental work demonstrates that our approach can easily be applied commodity pro- cessors and can reduce heat generation by 30% on average with minimal effect on performance when running the SPEC benchmarks.

[1]  Ulrich Kremer,et al.  The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction , 2003, PLDI '03.

[2]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[3]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[4]  Wu-chun Feng,et al.  Making a Case for Efficient Supercomputing , 2003, ACM Queue.

[5]  Wu-chun Feng,et al.  Honey, I shrunk the Beowulf! , 2002, Proceedings International Conference on Parallel Processing.

[6]  Wu-chun Feng,et al.  Effective Dynamic Voltage Scaling Through CPU-Boundedness Detection , 2004, PACS.

[7]  Thomas R. Gross,et al.  Effectiveness of simple memory models for performance prediction , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[8]  Rajesh K. Gupta,et al.  Dynamic voltage scaling for systemwide energy minimization in real-time embedded systems , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[9]  Luca Benini,et al.  Dynamic voltage scaling and power management for portable systems , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[10]  Alan Jay Smith,et al.  PACE: a new approach to dynamic voltage scaling , 2004, IEEE Transactions on Computers.

[11]  Krisztián Flautner,et al.  Vertigo: Automatic Performance-Setting for Linux , 2002, OSDI.

[12]  Chandrakant D. Patel,et al.  B13-115 A VISION OF ENERGY AWARE COMPUTING FROM CHIPS TO DATA CENTERS , 2003 .