论文信息 - μDPM: Dynamic Power Management for the Microsecond Era

μDPM: Dynamic Power Management for the Microsecond Era

The complex, distributed nature of data centers have spawned the adoption of distributed, multi-tiered software architectures, consisting of many inter-connected microservices. These microservices exhibit extremely short request service times, often less than 250μs. We show that these “killer microsecond” service times can cause state-of-the-art dynamic power management techniques to break down, due to short idle period length and low power state transition overheads. In this paper, we propose μDPM, a dynamic power management scheme for the microsecond era that coordinates request delaying, per-core sleep states, and voltage frequency scaling. The idea is to postpone the wake up of a CPU as long as possible and then adjust the frequency so that the tail latency constraint of requests are satisfied just-in-time. μDPM reduces processor energy consumption by up to 32% and consistently outperforms state-of-the-art techniques by 2x. Keywords-Dynamic power management, DVFS, Sleep states

[1] Dmitry Namiot,et al. On micro-services architecture , 2014 .

[2] Christoforos E. Kozyrakis,et al. Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[3] James E. Smith,et al. A performance counter architecture for computing accurate CPI components , 2006, ASPLOS XII.

[4] Michael Ferdman,et al. Taming the Killer Microsecond , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5] Nam Sung Kim,et al. SleepScale: Runtime joint speed scaling and sleep states management for power efficient data centers , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[6] David A. Patterson,et al. Attack of the killer microseconds , 2017, Commun. ACM.

[7] Gu-Yeon Wei,et al. Tradeoffs between power management and tail latency in warehouse-scale applications , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[8] Pradip Bose,et al. Dynamic power gating with quality guarantees , 2009, ISLPED.

[9] Daniel Sánchez,et al. Ubik: efficient cache sharing with strict qos for latency-critical workloads , 2014, ASPLOS.

[10] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[11] Mahmut T. Kandemir,et al. Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[12] David Dice,et al. The TURBO Diaries: Application-controlled Frequency Scaling Explained , 2014, USENIX Annual Technical Conference.

[13] Eddie Kohler,et al. Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[14] Indrani Paul,et al. Understanding idle behavior and power gating mechanisms in the context of modern benchmarks on CPU-GPU Integrated systems , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[15] Laxmi N. Bhuyan,et al. DynSleep: Fine-grained Power Management for a Latency-Critical Data Center Application , 2016, ISLPED.

[16] Xiang Pan,et al. NVSleep: Using non-volatile memory to enable fast sleep/wakeup of idle cores , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[17] Yale N. Patt,et al. Predicting Performance Impact of DVFS for Realistic Memory Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[18] Kevin Skadron,et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19] Bo Zhao,et al. A 3us wake-up time nonvolatile processor based on ferroelectric flip-flops , 2012, 2012 Proceedings of the ESSCIRC (ESSCIRC).

[20] Quan Chen,et al. PowerChief: Intelligent power allocation for multi-stage applications to improve responsiveness on power constrained CMP , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[21] Thomas Ilsche,et al. An Energy Efficiency Feature Survey of the Intel Haswell Processor , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[22] Junjie Wu,et al. BigHouse: A simulation infrastructure for data center systems , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[23] Thomas F. Wenisch,et al. PowerNap: eliminating server idle power , 2009, ASPLOS.

[24] Pradip Bose,et al. Microarchitectural techniques for power gating of execution units , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[25] Alan L. Cox,et al. Adaptive parallelism for web search , 2013, EuroSys '13.

[26] Luiz André Barroso,et al. Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[27] Thomas F. Wenisch,et al. µTune: Auto-Tuned Threading for OLDI Microservices , 2018, OSDI.

[28] Thomas F. Wenisch,et al. DreamWeaver: architectural support for deep sleep , 2012, ASPLOS XVII.

[29] Laxmi N. Bhuyan,et al. A multicore vacation scheme for thermal-aware packet processing , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[30] Daniel Wong,et al. Implications of high energy proportional servers on cluster-wide energy proportionality , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[31] Victor W. Lee,et al. Voltage Regulator Efficiency Aware Power Management , 2017, ASPLOS.

[32] Thomas F. Wenisch,et al. Enhancing Server Efficiency in the Face of Killer Microseconds , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[33] Jialin Li,et al. Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.

[34] T. N. Vijaykumar,et al. TimeTrader: Exploiting latency tail to save datacenter energy for online search , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[35] Luiz André Barroso,et al. The Case for Energy-Proportional Computing , 2007, Computer.

[36] Gu-Yeon Wei,et al. Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[37] Daniel Wong,et al. Peak Efficiency Aware Scheduling for Highly Energy Proportional Servers , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[38] Henry Hoffmann,et al. Racing and Pacing to Idle: Theoretical and Empirical Analysis of Energy Optimization Heuristics , 2015, 2015 IEEE 3rd International Conference on Cyber-Physical Systems, Networks, and Applications.

[39] Wolf-Dietrich Weber,et al. Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[40] David M. Brooks,et al. CARB: A C-State Power Management Arbiter for Latency-Critical Workloads , 2017, IEEE Computer Architecture Letters.

[41] Lingjia Tang,et al. Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[42] Daniel Wong,et al. KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[43] William Jalby,et al. Evaluation of CPU frequency transition latency , 2014, Computer Science - Research and Development.

[44] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[45] Jeffrey Dean,et al. Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.

[46] Gernot Heiser,et al. Dynamic voltage and frequency scaling: the laws of diminishing returns , 2010 .

[47] Longjun Liu,et al. HOPE: Enabling Efficient Service Orchestration in Software-Defined Data Centers , 2016, ICS.

[48] Daniel Sánchez,et al. Rubik: Fast analytical power management for latency-critical systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[49] E. N. Elnozahy,et al. Energy Conservation Policies for Web Servers , 2003, USENIX Symposium on Internet Technologies and Systems.

[50] Fabrice Paillet,et al. FIVR — Fully integrated voltage regulators on 4th generation Intel® Core™ SoCs , 2014, 2014 IEEE Applied Power Electronics Conference and Exposition - APEC 2014.

[51] Ronald G. Dreslinski,et al. Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).