Joint optimization of idle and cooling power in data centers while maintaining response time

Server power and cooling power amount to a significant fraction of modern data centers' recurring costs. While data centers provision enough servers to guarantee response times under the maximum loading, data centers operate under much less loading most of the times (e.g., 30-70% of the maximum loading). Previous server-power proposals exploit this under-utilization to reduce the server idle power by keeping active only as many servers as necessary and putting the rest into low-power standby modes. However, these proposals incur higher cooling power due to hot spots created by concentrating the data center loading on fewer active servers, or degrade response times due to standby-to-active transition delays, or both. Other proposals optimize the cooling power but incur considerable idle power. To address the first issue of power, we propose PowerTrade, which trades-off idle power and cooling power for each other, thereby reducing the total power. To address the second issue of response time, we propose SurgeGuard to overprovision the number of active servers beyond that needed by the current loading so as to absorb future increases in the loading. SurgeGuard is a two-tier scheme which uses well-known over-provisioning at coarse time granularities (e.g., one hour) to absorb the common, smooth increases in the loading, and a novel fine-grain replenishment of the over-provisioned reserves at fine time granularities (e.g., five minutes) to handle the uncommon, abrupt loading surges. Using real-world traces, we show that combining PowerTrade and SurgeGuard reduces total power by 30% compared to previous low-power schemes while maintaining response times within 1.7%.

[1]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[2]  David E. Irwin,et al.  Ensemble-level Power Management for Dense Blade Servers , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[3]  Enrique V. Carrera,et al.  Load balancing and unbalancing for power and performance in cluster-based systems , 2001 .

[4]  Michael Kistler,et al.  The case for power management in web servers , 2002 .

[5]  E. N. Elnozahy,et al.  Energy-Efficient Server Clusters , 2002, PACS.

[6]  Trevor N. Mudge,et al.  Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments , 2008, 2008 International Symposium on Computer Architecture.

[7]  Jeffrey S. Chase,et al.  Balance of power: dynamic thermal management for Internet data centers , 2005, IEEE Internet Computing.

[8]  Vanish Talwar,et al.  No "power" struggles: coordinated multi-level power management for the data center , 2008, ASPLOS.

[9]  Prashant J. Shenoy,et al.  Resource overbooking and application profiling in shared hosting platforms , 2002, OSDI '02.

[10]  Ricardo Bianchini,et al.  C-Oracle: Predictive thermal management for data centers , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[11]  Sandeep K. S. Gupta,et al.  Software Architecture for Dynamic Thermal Management in Datacenters , 2007, 2007 2nd International Conference on Communication Systems Software and Middleware.

[12]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[13]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[14]  Arnold O. Allen,et al.  Probability, statistics and queueing theory - with computer science applications (2. ed.) , 1981, Int. CMG Conference.

[15]  Gunter Bolch,et al.  Queueing Networks and Markov Chains , 2005 .

[16]  Karsten Schwan,et al.  VirtualPower: coordinated power management in virtualized enterprise systems , 2007, SOSP.

[17]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[18]  Ricardo Bianchini,et al.  Energy conservation in heterogeneous server clusters , 2005, PPoPP.

[19]  William LeFebvre,et al.  CNN.com: Facing a World Crisis , 2001, LiSA.

[20]  Peter D. Welch,et al.  On a Generalized M/G/1 Queuing Process in Which the First Customer of Each Busy Period Receives Exceptional Service , 1964 .

[21]  Joonwon Lee,et al.  Modeling and Managing Thermal Profiles of Rack-mounted Servers with ThermoStat , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[22]  Christian Belady,et al.  GREEN GRID DATA CENTER POWER EFFICIENCY METRICS: PUE AND DCIE , 2008 .

[23]  Anand Sivasubramaniam,et al.  Managing server energy and operational costs in hosting centers , 2005, SIGMETRICS '05.

[24]  Jeffrey S. Chase,et al.  Weatherman: Automated, Online and Predictive Thermal Mapping and Management for Data Centers , 2006, 2006 IEEE International Conference on Autonomic Computing.

[25]  James R. Hamilton,et al.  Internet-scale service infrastructure efficiency , 2009, ISCA '09.

[26]  Claudio Scordino,et al.  Energy-Efficient Real-Time Heterogeneous Server Clusters , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).

[27]  Kevin Skadron,et al.  Power-aware QoS management in Web servers , 2003, RTSS 2003. 24th IEEE Real-Time Systems Symposium, 2003.

[28]  Jeffrey S. Chase,et al.  Making Scheduling "Cool": Temperature-Aware Workload Placement in Data Centers , 2005, USENIX Annual Technical Conference, General Track.

[29]  Gunter Bolch,et al.  Queueing Networks and Markov Chains - Modeling and Performance Evaluation with Computer Science Applications, Second Edition , 1998 .