A Scalable Priority-Aware Approach to Managing Data Center Server Power

Power management is a key component of modern data center design. Power managers must (1) ensure the costand energy-efficient utilization of the data center infrastructure, (2) maintain availability of the services provided by the center, and (3) address environmental concerns associated with the center’s power consumption. While several power management techniques have been proposed and deployed in production data centers, there are still many challenges to comprehensive data center power management. This is particularly true in public cloud environments, where different jobs have different priority levels, and where high availability is critical. One example of the challenges facing public cloud data centers involves power capping. As power delivery must be highly reliable and tolerate wide variation in the load drawn by the data center components, the power infrastructure (e.g., power supplies, circuit breakers, UPS) has high redundancy and overprovisioning. During normal operation (i.e., typical server power demands, and no failures in the center), the power infrastructure is significantly underutilized. Power capping is a common solution to reduce this underutilization, by allowing more servers to be added safely (i.e., without power shortfalls) to the existing power infrastructure, and throttling power consumption in the infrequent cases where the demanded power exceeds the provisioned power capacity to avoid shortfalls. However, state-of-the-art power capping solutions are (1) not directly applicable to the redundant power infrastructure used in highly-available data centers; and (2) oblivious to differing workload priorities across the entire center when power consumption needs to be throttled, which can unnecessarily slow down high-priority work. To address this need, we develop CapMaestro, a new power management architecture with three key features for public cloud data centers. First, CapMaestro is designed to work with multiple power feeds (i.e., sources), and exploits server-level power capping to independently cap the load on each feed of a server. Second, CapMaestro uses a scalable, global priority-aware power capping approach, which accounts for power capacity at each level of the power distribution hierarchy. It exploits the underutilization of commonly-employed redundant power infrastructure at each level of the hierarchy to safely accommodate a much greater number of servers. Third, CapMaestro exploits stranded power (i.e., power budgets that are not utilized) in redundant power infrastructure to boost the performance of workloads in the data center. We add CapMaestro to a real cloud data center control plane, and demonstrate the effectiveness of all three key features. Using a large-scale data center simulation, we demonstrate that CapMaestro significantly and safely increases the number of servers for existing infrastructure. We also call out other key technical challenges the industry faces in data center power management.

[1]  Na Li,et al.  Fast Decentralized Power Capping for Server Clusters , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[2]  Wei Xu,et al.  Increasing large-scale data center capacity by statistical power control , 2016, EuroSys.

[3]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[4]  Henry Hoffmann,et al.  GRAPE: Minimizing energy for GPU applications with performance requirements , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Quan Chen,et al.  PowerChief: Intelligent power allocation for multi-stage applications to improve responsiveness on power constrained CMP , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[6]  Martin Schulz,et al.  Dynamic power sharing for higher job throughput , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Martin Schulz,et al.  Finding the limits of power-constrained application performance , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Vanish Talwar,et al.  No "power" struggles: coordinated multi-level power management for the data center , 2008, ASPLOS.

[9]  Michael Huang,et al.  A Case for a More Effective, Power-Efficient Turbo Boosting , 2018, ACM Trans. Archit. Code Optim..

[10]  Jie Liu,et al.  Power Budgeting for Virtualized Data Centers , 2011, USENIX Annual Technical Conference.

[11]  Lakshmi Ganesh,et al.  Unleash Stranded Power in Data Centers with RackPacker , 2009 .

[12]  Lingjia Tang,et al.  SmoothOperator: Reducing Power Fragmentation and Improving Power Utilization in Large-scale Datacenters , 2018, ASPLOS.

[13]  Tore Hägglund,et al.  Advanced PID Control , 2005 .

[14]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[15]  David E. Irwin,et al.  Ensemble-level Power Management for Dense Blade Servers , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[16]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[17]  Michael Lang,et al.  Trapped Capacity: Scheduling under a Power Cap to Maximize Machine-Room Throughput , 2014, 2014 Energy Efficient Supercomputing Workshop.

[18]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[19]  Ajay Dholakia,et al.  Using on-line power modeling for server power capping , 2009 .

[20]  Yang Li,et al.  CapMaestro : Exploiting Power Redundancy , Data Center-Wide Priorities , and Stranded Power for Boosting Data Center Performance , 2018 .

[21]  Houman Homayoun,et al.  Managing distributed UPS energy for effective power capping in data centers , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[22]  Laxmikant V. Kalé,et al.  Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Anand Sivasubramaniam,et al.  Statistical profiling-based techniques for effective power provisioning in data centers , 2009, EuroSys '09.

[24]  Martin Schulz,et al.  Production Hardware Overprovisioning: Real-World Performance Optimization Using an Extensible Power-Aware Resource Management Framework , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[25]  Mohammad Alian,et al.  NCAP: Network-Driven, Packet Context-Aware Power Management for Client-Server Architecture , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[26]  Yang Li,et al.  SizeCap: Efficiently handling power surges in fuel cell powered data centers , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[27]  Zvonimir Bandic,et al.  PCAP: Performance-aware Power Capping for the Disk Drive in the Cloud , 2016, FAST.

[28]  Henry Hoffmann,et al.  Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques , 2016, ASPLOS.

[29]  Xiaorui Wang,et al.  SHIP: A Scalable Hierarchical Power Control Architecture for Large-Scale Data Centers , 2012, IEEE Transactions on Parallel and Distributed Systems.

[30]  Sriram Sankar,et al.  The need for speed and stability in data center power capping , 2012, 2012 International Green Computing Conference (IGCC).

[31]  Victor W. Lee,et al.  Voltage Regulator Efficiency Aware Power Management , 2017, ASPLOS.

[32]  Xu Yang,et al.  A Data Driven Scheduling Approach for Power Management on HPC Systems , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  Jianguo Yao,et al.  Quantitative Availability Analysis of Hierarchical Datacenter under Power Oversubscription , 2017, 2017 IEEE International Conference on Smart Computing (SMARTCOMP).

[34]  Chao Li,et al.  Enabling distributed generation powered sustainable high-performance data center , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[35]  Jie Liu,et al.  Underprovisioning backup power infrastructure for datacenters , 2014, ASPLOS.

[36]  Bin Li,et al.  Dynamo: Facebook's Data Center-Wide Power Management System , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[37]  Xiaorui Wang,et al.  How much power oversubscription is safe and allowed in data centers , 2011, ICAC '11.

[38]  Xiaorui Wang,et al.  Power capping: a prelude to power shifting , 2008, Cluster Computing.