PowerStar: Improving Power Efficiency in Heterogenous Processors for Bursty Workloads with Approximate Computing

Modern Data Centers have increasingly adopted heterogeneous processors in their server nodes to maximize power efficiency. However, there are still challenges in how to properly configure these processors such that throughput can be maximized under fluctuating workload while optimizing system power consumption. In this paper, we propose PowerStar, a framework that maximizes power efficiency and reduces the number of reconfigurations needed in heterogeneous processors during periods of fluctuations in job arrival patterns while handling latency-critical workloads. PowerStar is built based on the following two key observations: (i) reconfiguration of heterogeneous processors to add more cores and enable higher performance and/or re-allocation of computing cores can be costly due to the extra latency involved and the associated energy overheads; (ii) a considerable amount of energy savings can be achieved by keeping the system in most power-efficient configurations capable of absorbing short bursts in job arrivals without needing to reconfigure the system. PowerStar operates by carefully choosing the most power-efficient configurations (states) and judiciously maximizing the state residency through the controlled use of approximate computing, when feasible. We implement PowerStar as a prototype on a 6-core ARM big.LITTLE heterogeneous platform and evaluate it with a variety of workloads. Our results show that, compared to a baseline of performance-driven power management policy, our power efficiency-aware PowerStar can reduce the average power by up to 11% under tight QoS (95th percentile latency under 3× job execution latency), and can save even higher average power of up to 32% under relaxed QoS (95th percentile latency under 10× job execution latency) constraints when compared to the baseline.

[1]  Fan Yao,et al.  WASP: Workload Adaptive Energy-Latency Optimization in Server Farms Using Server Low-Power States , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[2]  Fan Yao,et al.  TS-BatPro: Improving Energy Efficiency in Data Centers by Leveraging Temporal–Spatial Batching , 2019, IEEE Transactions on Green Communications and Networking.

[3]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[4]  Guru Venkataramani,et al.  enDebug: A hardware-software framework for automated energy debugging , 2016, J. Parallel Distributed Comput..

[5]  Scott A. Mahlke,et al.  Heterogeneous microarchitectures trump voltage scaling for low-power cores , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[6]  Myungsun Kim,et al.  Utilization-aware load balancing for the energy efficient operation of the big.LITTLE processor , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7]  Martin C. Rinard,et al.  Proving acceptability properties of relaxed nondeterministic approximate programs , 2012, PLDI.

[8]  Henry Hoffmann,et al.  Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.

[9]  Thu D. Nguyen,et al.  Exploiting Heterogeneity for Tail Latency and Energy Efficiency , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Christina Delimitrou,et al.  Pliant: Leveraging Approximation to Improve Datacenter Resource Efficiency , 2018, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[11]  Joongheon Kim,et al.  A case for bad big.LITTLE switching: how to scale power-performance in SI-HMP , 2015, HotPower '15.

[12]  Daniel Sánchez,et al.  Rubik: Fast analytical power management for latency-critical systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  David Novo,et al.  Full-System Simulation of big.LITTLE Multicore Architecture for Performance and Energy Exploration , 2016, 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC).

[14]  Henry Hoffmann,et al.  MEANTIME: Achieving Both Minimal Energy and Timeliness with Approximate Computing , 2016, USENIX Annual Technical Conference.

[15]  Rui Han,et al.  CLAP: Component-Level Approximate Processing for Low Tail Latency and High Result Accuracy in Cloud Online Services , 2017, IEEE Transactions on Parallel and Distributed Systems.

[16]  Guru Venkataramani,et al.  The Need for Power Debugging in the Multi-Core Environment , 2012, IEEE Computer Architecture Letters.

[17]  Fan Yao,et al.  A Dual Delay Timer Strategy for Optimizing Server Farm Energy , 2015, 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom).

[18]  David Mosberger,et al.  httperf—a tool for measuring web server performance , 1998, PERV.

[19]  Mohammad Alian,et al.  NCAP: Network-Driven, Packet Context-Aware Power Management for Client-Server Architecture , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[20]  Fan Yao,et al.  PopCorns: Power Optimization Using a Cooperative Network-Server Approach for Data Centers , 2018, 2018 27th International Conference on Computer Communication and Networks (ICCCN).

[21]  Paul M. Carpenter,et al.  Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[22]  Frank Mueller,et al.  Push-assisted migration of real-time tasks in multi-core processors , 2009, LCTES '09.

[23]  Yongbo Li,et al.  MobiQoR: Pushing the Envelope of Mobile Edge Computing Via Quality-of-Result Optimization , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[24]  Nam Sung Kim,et al.  SleepScale: Runtime joint speed scaling and sleep states management for power efficient data centers , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[25]  Fan Yao,et al.  Watts-inside: A hardware-software cooperative approach for Multicore Power Debugging , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[26]  Pasi Liljeberg,et al.  Approximation-Aware Coordinated Power/Performance Management for Heterogeneous Multi-cores , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[27]  Daniel Mossé,et al.  Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).