Robust control-theoretic thermal balancing for server clusters

Thermal management is critical for clusters because of the increasing power consumption of modern processors, compact server architectures and growing server density in data centers. Thermal balancing mitigates hot spots in a cluster through dynamic load distribution among servers. This paper presents two Control-theoretical Thermal Balancing (CTB) algorithms that dynamically balance the temperatures of different servers based on online measurements. CTB features controllers rigorously designed based on optimal control theory and a difference equation model that approximates the thermal dynamics of clusters. Control analysis and simulation results demonstrate that CTB achieves robust thermal balancing under a wide range of uncertainties: (1) when different tasks incur different power consumptions on the CPUs, (2) when servers experience different ambient temperatures, and (3) when servers experience thermal faults.

[1]  Kevin Skadron,et al.  Control-theoretic techniques and thermal-RC modeling for accurate and localized dynamic thermal management , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[2]  Kevin Skadron,et al.  Compact thermal modeling for temperature-aware design , 2004, Proceedings. 41st Design Automation Conference, 2004..

[3]  Frank Bellosa,et al.  Balancing power consumption in multiprocessor systems , 2006, EuroSys.

[4]  Chenyang Lu,et al.  Feedback control real-time scheduling in ORB middleware , 2003, The 9th IEEE Real-Time and Embedded Technology and Applications Symposium, 2003. Proceedings..

[5]  Daniel Mossé,et al.  Thermal Faults Modeling Using a RC Model with an Application to Web Farms , 2007, 19th Euromicro Conference on Real-Time Systems (ECRTS'07).

[6]  Margaret Martonosi,et al.  Identifying program power phase behavior using power vectors , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).

[7]  Gene F. Franklin,et al.  Digital control of dynamic systems , 1980 .

[8]  Jeffrey S. Chase,et al.  Weatherman: Automated, Online and Predictive Thermal Mapping and Management for Data Centers , 2006, 2006 IEEE International Conference on Autonomic Computing.

[9]  Sarita V. Adve,et al.  Predictive dynamic thermal management for multimedia applications , 2003, ICS '03.

[10]  N. VijaykumarT.,et al.  Heat-and-run , 2004 .

[11]  Chenyang Lu,et al.  Feedback utilization control in distributed real-time systems with end-to-end tasks , 2005, IEEE Transactions on Parallel and Distributed Systems.

[12]  Alan J. Weger,et al.  Thermal-aware task scheduling at the system software level , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[13]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[14]  Margaret Martonosi,et al.  Dynamic thermal management for high-performance microprocessors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[15]  J. Chase,et al.  Going beyond CPUs: The potential of Temperature-Aware Solutions for the Data Center , 2004 .

[16]  Chung Laung Liu,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[17]  Sharad Malik,et al.  Power analysis of embedded software: a first step towards software power minimization , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[18]  Graham C. Goodwin,et al.  Digital control and estimation : a unified approach , 1990 .

[19]  T. N. Vijaykumar,et al.  Heat-and-run: leveraging SMT and CMP to manage power density through the operating system , 2004, ASPLOS XI.

[20]  Frank Bellosa,et al.  Event-Driven Energy Accounting for Dynamic Thermal Management , 2002 .