μDPM: Dynamic Power Management for the Microsecond Era

The complex, distributed nature of data centers have spawned the adoption of distributed, multi-tiered software architectures, consisting of many inter-connected microservices. These microservices exhibit extremely short request service times, often less than 250μs. We show that these “killer microsecond” service times can cause state-of-the-art dynamic power management techniques to break down, due to short idle period length and low power state transition overheads. In this paper, we propose μDPM, a dynamic power management scheme for the microsecond era that coordinates request delaying, per-core sleep states, and voltage frequency scaling. The idea is to postpone the wake up of a CPU as long as possible and then adjust the frequency so that the tail latency constraint of requests are satisfied just-in-time. μDPM reduces processor energy consumption by up to 32% and consistently outperforms state-of-the-art techniques by 2x. Keywords-Dynamic power management, DVFS, Sleep states

[1]  Dmitry Namiot,et al.  On micro-services architecture , 2014 .

[2]  Christoforos E. Kozyrakis,et al.  Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[3]  James E. Smith,et al.  A performance counter architecture for computing accurate CPI components , 2006, ASPLOS XII.

[4]  Michael Ferdman,et al.  Taming the Killer Microsecond , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Nam Sung Kim,et al.  SleepScale: Runtime joint speed scaling and sleep states management for power efficient data centers , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[6]  David A. Patterson,et al.  Attack of the killer microseconds , 2017, Commun. ACM.

[7]  Gu-Yeon Wei,et al.  Tradeoffs between power management and tail latency in warehouse-scale applications , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[8]  Pradip Bose,et al.  Dynamic power gating with quality guarantees , 2009, ISLPED.

[9]  Daniel Sánchez,et al.  Ubik: efficient cache sharing with strict qos for latency-critical workloads , 2014, ASPLOS.

[10]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[11]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[12]  David Dice,et al.  The TURBO Diaries: Application-controlled Frequency Scaling Explained , 2014, USENIX Annual Technical Conference.

[13]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[14]  Indrani Paul,et al.  Understanding idle behavior and power gating mechanisms in the context of modern benchmarks on CPU-GPU Integrated systems , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[15]  Laxmi N. Bhuyan,et al.  DynSleep: Fine-grained Power Management for a Latency-Critical Data Center Application , 2016, ISLPED.

[16]  Xiang Pan,et al.  NVSleep: Using non-volatile memory to enable fast sleep/wakeup of idle cores , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[17]  Yale N. Patt,et al.  Predicting Performance Impact of DVFS for Realistic Memory Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[18]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Bo Zhao,et al.  A 3us wake-up time nonvolatile processor based on ferroelectric flip-flops , 2012, 2012 Proceedings of the ESSCIRC (ESSCIRC).

[20]  Quan Chen,et al.  PowerChief: Intelligent power allocation for multi-stage applications to improve responsiveness on power constrained CMP , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[21]  Thomas Ilsche,et al.  An Energy Efficiency Feature Survey of the Intel Haswell Processor , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[22]  Junjie Wu,et al.  BigHouse: A simulation infrastructure for data center systems , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[23]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[24]  Pradip Bose,et al.  Microarchitectural techniques for power gating of execution units , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[25]  Alan L. Cox,et al.  Adaptive parallelism for web search , 2013, EuroSys '13.

[26]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[27]  Thomas F. Wenisch,et al.  µTune: Auto-Tuned Threading for OLDI Microservices , 2018, OSDI.

[28]  Thomas F. Wenisch,et al.  DreamWeaver: architectural support for deep sleep , 2012, ASPLOS XVII.

[29]  Laxmi N. Bhuyan,et al.  A multicore vacation scheme for thermal-aware packet processing , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[30]  Daniel Wong,et al.  Implications of high energy proportional servers on cluster-wide energy proportionality , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[31]  Victor W. Lee,et al.  Voltage Regulator Efficiency Aware Power Management , 2017, ASPLOS.

[32]  Thomas F. Wenisch,et al.  Enhancing Server Efficiency in the Face of Killer Microseconds , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[33]  Jialin Li,et al.  Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.

[34]  T. N. Vijaykumar,et al.  TimeTrader: Exploiting latency tail to save datacenter energy for online search , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[35]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[36]  Gu-Yeon Wei,et al.  Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[37]  Daniel Wong,et al.  Peak Efficiency Aware Scheduling for Highly Energy Proportional Servers , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[38]  Henry Hoffmann,et al.  Racing and Pacing to Idle: Theoretical and Empirical Analysis of Energy Optimization Heuristics , 2015, 2015 IEEE 3rd International Conference on Cyber-Physical Systems, Networks, and Applications.

[39]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[40]  David M. Brooks,et al.  CARB: A C-State Power Management Arbiter for Latency-Critical Workloads , 2017, IEEE Computer Architecture Letters.

[41]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[42]  Daniel Wong,et al.  KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[43]  William Jalby,et al.  Evaluation of CPU frequency transition latency , 2014, Computer Science - Research and Development.

[44]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[45]  Jeffrey Dean,et al.  Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.

[46]  Gernot Heiser,et al.  Dynamic voltage and frequency scaling: the laws of diminishing returns , 2010 .

[47]  Longjun Liu,et al.  HOPE: Enabling Efficient Service Orchestration in Software-Defined Data Centers , 2016, ICS.

[48]  Daniel Sánchez,et al.  Rubik: Fast analytical power management for latency-critical systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[49]  E. N. Elnozahy,et al.  Energy Conservation Policies for Web Servers , 2003, USENIX Symposium on Internet Technologies and Systems.

[50]  Fabrice Paillet,et al.  FIVR — Fully integrated voltage regulators on 4th generation Intel® Core™ SoCs , 2014, 2014 IEEE Applied Power Electronics Conference and Exposition - APEC 2014.

[51]  Ronald G. Dreslinski,et al.  Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).