Increasing large-scale data center capacity by statistical power control

Given the high cost of large-scale data centers, an important design goal is to fully utilize available power resources to maximize the computing capacity. In this paper we present Ampere, a novel power management system for data centers to increase the computing capacity by over-provisioning the number of servers. Instead of doing power capping that degrades the performance of running jobs, we use a statistical control approach to implement dynamic power management by indirectly affecting the workload scheduling, which can enormously reduce the risk of power violations. Instead of being a part of the already over-complicated scheduler, Ampere only interacts with the scheduler with two basic APIs. Instead of power control on the rack level, we impose power constraint on the row level, which leads to more room for over provisioning. We have implemented and deployed Ampere in our production data center. Controlled experiments on 400+ servers show that by adding 17% servers, we can increase the throughput of the data center by 15%, leading to significant cost savings while bringing no disturbances to the job performance.

[1]  Sparsh Mittal,et al.  Power Management Techniques for Data Centers: A Survey , 2014, ArXiv.

[2]  Hermann Härtig,et al.  Measuring energy consumption for short code paths using RAPL , 2012, PERV.

[3]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[4]  Eyal de Lara,et al.  Jettison: efficient idle desktop consolidation with partial VM migration , 2012, EuroSys '12.

[5]  Gargi Dasgupta,et al.  Server Workload Analysis for Power Minimization using Consolidation , 2009, USENIX Annual Technical Conference.

[6]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[7]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[8]  Houman Homayoun,et al.  Managing distributed UPS energy for effective power capping in data centers , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[9]  Ion Stoica,et al.  The Power of Choice in Data-Aware Cluster Scheduling , 2014, OSDI.

[10]  S. Joe Qin,et al.  A survey of industrial model predictive control technology , 2003 .

[11]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[12]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[13]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[14]  Daniel Mossé,et al.  Energy-efficient policies for embedded clusters , 2005, LCTES '05.

[15]  Vanish Talwar,et al.  No "power" struggles: coordinated multi-level power management for the data center , 2008, ASPLOS.

[16]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[17]  Christoforos E. Kozyrakis,et al.  Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[18]  Lachlan L. H. Andrew,et al.  Dynamic right-sizing for power-proportional data centers , 2011, 2011 Proceedings IEEE INFOCOM.

[19]  Nam Sung Kim,et al.  SleepScale: Runtime joint speed scaling and sleep states management for power efficient data centers , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[20]  Richard E. Harper,et al.  Workload-based power management for parallel computer systems , 2003, IBM J. Res. Dev..

[21]  Kushagra Vaid,et al.  ACE: abstracting, characterizing and exploiting peaks and valleys in datacenter power consumption , 2013, SIGMETRICS '13.

[22]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[23]  Ricardo Bianchini,et al.  Barely alive memory servers: Keeping data active in a low-power state , 2012, JETC.

[24]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[25]  Nagarajan Kandasamy,et al.  Power and performance management of virtualized computing environments via lookahead control , 2008, 2008 International Conference on Autonomic Computing.

[26]  James Norris,et al.  Agile, efficient virtualization power management with low-latency server power states , 2013, ISCA.

[27]  Randy H. Katz,et al.  An energy case for hybrid datacenters , 2010, OPSR.

[28]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[29]  Athanasios V. Vasilakos,et al.  Cloud Computing , 2014, ACM Comput. Surv..

[30]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[31]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[32]  Christina Delimitrou,et al.  QoS-Aware scheduling in heterogeneous datacenters with paragon , 2013, TOCS.

[33]  Liang Liu,et al.  GreenCloud: a new architecture for green data center , 2009, ICAC-INDST '09.

[34]  W. Kwon,et al.  Receding Horizon Control: Model Predictive Control for State Models , 2005 .

[35]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[36]  Enrique V. Carrera,et al.  Load balancing and unbalancing for power and performance in cluster-based systems , 2001 .

[37]  Laurent Lefèvre,et al.  A survey on techniques for improving the energy efficiency of large-scale distributed systems , 2014, ACM Comput. Surv..

[38]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[39]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[40]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[41]  Claudio Scordino,et al.  Energy-Efficient Real-Time Heterogeneous Server Clusters , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).

[42]  Feng Zhao,et al.  Energy aware consolidation for cloud computing , 2008, CLUSTER 2008.

[43]  Kevin Skadron,et al.  Power-aware QoS management in Web servers , 2003, RTSS 2003. 24th IEEE Real-Time Systems Symposium, 2003.

[44]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition , 2013, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.

[45]  Anand Sivasubramaniam,et al.  Statistical profiling-based techniques for effective power provisioning in data centers , 2009, EuroSys '09.