Dynamic Thermal Management for Distributed Systems

In modern data centers, the impact on the thermal properties by increased scale and power densities is enormous and poses new challenges on the designers of both computing as well as cooling systems. Controltheoretic techniques have proven to manage the heat dissipation and the temperature to avoid thermal emergencies, but are not aware of the task currently executing or its specific service requirements. In this work we investigate an approach to dynamic thermal management with respect to the demands of individual applications, users or services. We show that the energy consumption and the temperature can be determined on a fine grained level and without the need for measurement, using information from event monitors embedded in modern processors. We extend the well-known abstraction of resource containers to an infrastructure for transparent energy and temperature management in distributed systems. In a cluster-based server, the processing of a request can be throttled to meet the thermal requirements of the system, even if machine boundaries are crossed, e.g. by remote procedure calls in a client/server relationship. With this facility, energy consumption can be accounted and the resulting heat generation be controlled precisely without the need for expensive hardware. Experiments on a Pentium 4 architecture show that energy and temperature are accurately determined and thermal limits for the individual CPU and the whole distributed system will not be exceeded. Use cases and important implications of our approach are discussed.

[1]  Frank Bellosa,et al.  The Performance Limits of Locality Information Usage in Shared-Memory Multiprocessors , 1996, J. Parallel Distributed Comput..

[2]  Mike Alexander,et al.  Thermal management system for high performance PowerPC/sup TM/ microprocessors , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[3]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[4]  Boris Weissman,et al.  Performance counters and state sharing annotations: a unified approach to thread locality , 1998, ASPLOS VIII.

[5]  Erven Rohou,et al.  Dynamically Managing Processor Temperature and Power , 1999 .

[6]  Peter Druschel,et al.  Resource containers: a new facility for resource management in server systems , 1999, OSDI '99.

[7]  R. Viswanath Thermal Performance Challenges from Silicon to Systems , 2000 .

[8]  Willy Zwaenepoel,et al.  Cluster reserves: a mechanism for resource management in cluster-based network servers , 2000, SIGMETRICS '00.

[9]  Christian Belady Cooling and power consideration for semiconductors into the next century , 2001, ISLPED '01.

[10]  Stephen H. Gunther,et al.  Managing the Impact of Increasing Microprocessor Power Consumption , 2001 .

[11]  Johannes G. Janzen Calculating Memory System Power for DDR SDRAM , 2001 .

[12]  Margaret Martonosi,et al.  Dynamic thermal management for high-performance microprocessors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[13]  Kevin Skadron,et al.  Control-theoretic techniques and thermal-RC modeling for accurate and localized dynamic thermal management , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[14]  Frank Bellosa,et al.  Process cruise control: event-driven clock scaling for dynamic power management , 2002, CASES '02.

[15]  Amin Vahdat,et al.  ECOSystem: managing energy as a first class operating system resource , 2002, ASPLOS X.

[16]  Frank Bellosa,et al.  Event-Driven Energy Accounting for Dynamic Thermal Management , 2002 .

[17]  Amin Vahdat,et al.  Currentcy: Unifying Policies for Resource Management , 2002 .

[18]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[19]  José González,et al.  Dynamic cluster resizing , 2003, Proceedings 21st International Conference on Computer Design.