Electro: Toward QoS-Aware Power Management for Latency-Critical Applications

Reducing the energy consumption of datacenters is critical for their scalability, sustainability, and affordability when hosting latency-critical applications. Prior work has focused on single-thread applications with a stable workload. Recently, multi-thread latency-sensitive services are widely used in current datacenters. However, the variability of user queries in these service makes existing schemes ineffective, leading to either QoS violations or higher energy consumption. In order to address this new problem, we propose Electro, a machine learning enhanced dynamic power management system. Electro consists of a query duration predictor and a query consolidating engine. The duration predictor can precisely predict the duration of each user query in different scenarios based on the pre- trained duration models. At runtime, according to the predicted duration, the query consolidating engine consolidates user queries accordingly to maximize the duration of the CPU idle states while guaranteeing the QoS. The longer each idle state is, the deeper low-power sleep states can the CPU enter. Our evaluation results on the latest Intel Xeon V4 CPU show that Electro reduces the energy consumption by 81.8% on average compared with the default OS scheduling, and by 14.4% on average compared with the state-of-the-art technique while achieving the 95%-ile latency target for latency-sensitive applications.

[1]  Fred L. Collopy,et al.  Error Measures for Generalizing About Forecasting Methods: Empirical Comparisons , 1992 .

[2]  Depei Qian,et al.  Chameleon: Adapting throughput server to time-varying green power budget using online learning , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[3]  Ronald G. Dreslinski,et al.  Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.

[4]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[5]  Daniel Moldovan,et al.  Dynamic frequency scaling algorithms for improving the CPU's energy efficiency , 2011, 2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing.

[6]  Kai Ma,et al.  TE-Shave: Reducing Data Center Capital and Operating Expenses with Thermal Energy Storage , 2015, IEEE Transactions on Computers.

[7]  Christoforos E. Kozyrakis,et al.  Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[8]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[9]  Quan Chen,et al.  DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[10]  Minyi Guo,et al.  Topology Design of Network-Coding-Based Multicast Networks , 2008, IEEE Transactions on Parallel and Distributed Systems.

[11]  Quan Chen,et al.  Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures , 2014, TACO.

[12]  Yi Pan,et al.  Symbolic Communication Set Generation for Irregular Parallel Applications , 2003, The Journal of Supercomputing.

[13]  Minyi Guo,et al.  Energy-Efficient Dual Prediction-Based Data Gathering for Environmental Monitoring Applications , 2007, 2007 IEEE Wireless Communications and Networking Conference.

[14]  Quan Chen,et al.  CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures , 2012, ICS '12.

[15]  Alan L. Cox,et al.  Adaptive parallelism for web search , 2013, EuroSys '13.

[16]  Anand Sivasubramaniam,et al.  Worth their watts? - an empirical study of datacenter servers , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[17]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[18]  Thomas F. Wenisch,et al.  DreamWeaver: architectural support for deep sleep , 2012, ASPLOS XVII.

[19]  Quan Chen,et al.  Adaptive Cache Aware Bitier Work-Stealing in Multisocket Multicore Architectures , 2013, IEEE Transactions on Parallel and Distributed Systems.

[20]  Quan Chen,et al.  LAWS: locality-aware work-stealing for multi-socket multi-core architectures , 2014, ICS '14.

[21]  Chandandeep Singh Pabla Completely fair scheduler , 2009 .

[22]  Minyi Guo,et al.  Designing energy efficient target tracking protocol with quality monitoring in wireless sensor networks , 2010, The Journal of Supercomputing.

[23]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[24]  Lingjia Tang,et al.  SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[25]  Sang Lyul Min,et al.  Energy-centric DVFS controlling method for multi-core platforms , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[26]  Mascon Global Limited Parallelizing a Computationally Intensive Financial R Application with Zircon Technology Zircon Computing LLC , 2010 .

[27]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28]  Laxmi N. Bhuyan,et al.  A multicore vacation scheme for thermal-aware packet processing , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[29]  Ronald G. Dreslinski,et al.  Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[30]  Shaolei Ren,et al.  Workload Consolidation for Cloud Data Centers with Guaranteed QoS Using Request Reneging , 2017, IEEE Transactions on Parallel and Distributed Systems.

[31]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[32]  Minyi Guo,et al.  LSCD: A Low-Storage Clone Detection Protocol for Cyber-Physical Systems , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[33]  Quan Chen,et al.  EEWA: Energy-Efficient Workload-Aware Task Scheduling in Multi-core Architectures , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[34]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[35]  Daniel Sánchez,et al.  Rubik: Fast analytical power management for latency-critical systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[36]  Laxmi N. Bhuyan,et al.  DynSleep: Fine-grained Power Management for a Latency-Critical Data Center Application , 2016, ISLPED.

[37]  Longjun Liu,et al.  Towards sustainable in-situ server systems in the big data era , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[38]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.