Energy optimising methodologies on heterogeneous data centres

In 2013, U.S. data centres accounted for 2.2% of the country's total electricity consumption, a figure that is projected to increase rapidly over the next decade. A significant proportion of power consumed within a data centre is attributed to the servers, and a large percentage of that is wasted as workloads compete for shared resources. Many data centres host interactive workloads (e.g., web search or e-commerce), for which it is critical to meet user expectations and user experience, called Quality of Service (QoS). There is also a wish to run both interactive and batch workloads on the same infrastructure to increase cluster utilisation and reduce operational costs and total energy consumption. Although much work has focused on the impacts of shared resource contention, it still remains a major problem to maintain QoS for both interactive and batch workloads. The goal of this thesis is twofold. First, to investigate how, and to what extent, resource contention has an effect on throughput and power of batch workloads via modelling. Second, we introduce a scheduling approach to determine on-the-fly the best configuration to satisfy the QoS for latency-critical jobs on any architecture. To achieve the above goals, we first propose a modelling technique to estimate server performance and power at runtime called Runtime Estimation of Performance and Power (REPP). REPP's goal is to allow administrators' control on power and performance of processors. REPP achieves this goal by estimating performance and power at multiple hardware settings (dynamic frequency and voltage states (DVFS), core consolidation and idle states) and dynamically sets these settings based on user-defined constraints. The hardware counters required to build the models are available across architectures, making it architecture agnostic. We also argue that traditional modelling and scheduling strategies are ineffective for interactive workloads. To manage such workloads, we propose Hipster that combines both a heuristic, and a reinforcement learning algorithm to manage interactive workloads. Hipster's goal is to improve resource efficiency while respecting the QoS of interactive workloads. Hipster achieves its goal by exploring the multicore system and DVFS. To improve utilisation and make the best usage of the available resources, Hipster can dynamically assign remaining cores to batch workloads without violating the QoS constraints for the interactive workloads. We implemented REPP and Hipster in real-life platforms, namely 64-bit commercial (Intel SandyBridge and AMD Phenom II X4 B97) and experimental hardware (ARM big.LITTLE Juno R1). After obtaining extensive experimental results, we have shown that REPP successfully estimates power and performance of several single-threaded and multiprogrammed workloads. The average errors on Intel, AMD and ARM architectures are, respectively, 7.1%, 9.0%, 7.1% when predicting performance, and 8.1%, 6.5%, 6.0% when predicting power. Similarly, we show that when compared to prior work, Hipster improves the QoS guarantee for Web-Search from 80% to 96%, and for Memcached from 92% to 99%, while reducing the energy consumption by up to 18% on the ARM architecture.

[1]  Sherief Reda,et al.  Pack & Cap: Adaptive DVFS and thread packing under power caps , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Michael B. Giles,et al.  Benchmarking the IBM Power8 processor , 2015, CASCON.

[3]  Scott A. Mahlke,et al.  Heterogeneous microarchitectures trump voltage scaling for low-power cores , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[4]  Lingjia Tang,et al.  Directly characterizing cross core interference through contention synthesis , 2011, HiPEAC.

[5]  Michael Stumm,et al.  Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.

[6]  Israel Koren,et al.  A Study on the Use of Performance Counters to Estimate Power in Microprocessors , 2013, IEEE Transactions on Circuits and Systems II: Express Briefs.

[7]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[8]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[9]  Rami G. Melhem,et al.  Energy-Efficient Thread Assignment Optimization for Heterogeneous Multicore Systems , 2015, ACM Trans. Embed. Comput. Syst..

[10]  Lizy Kurian John,et al.  Complete System Power Estimation Using Processor Performance Events , 2012, IEEE Transactions on Computers.

[11]  Bishop Brock,et al.  Introducing the Adaptive Energy Management Features of the Power7 Chip , 2011, IEEE Micro.

[12]  Tajana Simunic,et al.  System-Level Power Management Using Online Learning , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  T. N. Vijaykumar,et al.  TimeTrader: Exploiting latency tail to save datacenter energy for online search , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  Rajiv Nishtala,et al.  A Methodology to Build Models and Predict Performance-Power in CMPs , 2015, 2015 44th International Conference on Parallel Processing Workshops.

[15]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[16]  Mor Harchol-Balter,et al.  Optimality analysis of energy-performance trade-off for server farm management , 2010, Perform. Evaluation.

[17]  Pradip Bose,et al.  A case for guarded power gating for multi-core processors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[18]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Li Shen,et al.  PPEP: Online Performance, Power, and Energy Prediction Framework and DVFS Space Exploration , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[20]  Daniel Mossé,et al.  Energy-aware thread co-location in heterogeneous multicore processors , 2013, 2013 Proceedings of the International Conference on Embedded Software (EMSOFT).

[21]  Daniel Mossé,et al.  REPP-H: Runtime Estimation of Power and Performance on Heterogeneous Data Centers , 2016, 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[22]  Lothar Thiele,et al.  Thermal-Aware Global Real-Time Scheduling on Multicore Systems , 2009, 2009 15th IEEE Real-Time and Embedded Technology and Applications Symposium.

[23]  Eduard Ayguadé,et al.  PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite , 2016, ACM Trans. Archit. Code Optim..

[24]  Paul M. Carpenter,et al.  Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[25]  Daniel Mossé,et al.  Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[26]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[27]  Xu Zhou,et al.  GreenGear: Leveraging and Managing Server Heterogeneity for Improving Energy Efficiency in Green Data Centers , 2016, ICS.

[28]  Thomas F. Wenisch,et al.  CoScale: Coordinating CPU and Memory System DVFS in Server Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[29]  Hiroshi Sasaki,et al.  Power-capped DVFS and thread allocation with ANN models on modern NUMA systems , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[30]  Ripal Nathuji,et al.  Exploiting Platform Heterogeneity for Power Efficient Data Centers , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).

[31]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[32]  Christopher Torng,et al.  Enabling Realistic Fine-Grain Voltage Scaling with Reconfigurable Power Distribution Networks , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[33]  Osman S. Unsal,et al.  System-level power estimation tool for embedded processor based platforms , 2014, RAPIDO '14.

[34]  Sadagopan Srinivasan,et al.  Efficient interaction between OS and architecture in heterogeneous platforms , 2011, OPSR.

[35]  Rajesh Gupta,et al.  Evaluating the effectiveness of model-based power characterization , 2011 .

[36]  Manuel Prieto,et al.  Survey of scheduling techniques for addressing shared resources in multicore processors , 2012, CSUR.

[37]  David Bernstein,et al.  Containers and Cloud: From LXC to Docker to Kubernetes , 2014, IEEE Cloud Computing.

[38]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[39]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[40]  Rajiv Nishtala,et al.  RePP-C: Runtime estimation of performance-power with workload consolidation in CMPs , 2016, 2016 Seventh International Green and Sustainable Computing Conference (IGSC).

[41]  Christina Delimitrou,et al.  QoS-Aware scheduling in heterogeneous datacenters with paragon , 2013, TOCS.

[42]  Francisco Vilar Brasileiro,et al.  Long-term SLOs for reclaimed cloud computing resources , 2014, SoCC.

[43]  Margaret Martonosi,et al.  Phase characterization for power: evaluating control-flow-based and event-counter-based techniques , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[44]  G. Dhiman,et al.  Dynamic Power Management Using Machine Learning , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[45]  Milan Radulovic,et al.  Performance Impact of a Slower Main Memory: A case study of STT-MRAM in HPC , 2016, MEMSYS.

[46]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[48]  Lingjia Tang,et al.  Heterogeneity in “Homogeneous” Warehouse-Scale Computers: A Performance Opportunity , 2011, IEEE Computer Architecture Letters.

[49]  Stefanos Kaxiras,et al.  Green governors: A framework for Continuously Adaptive DVFS , 2011, 2011 International Green Computing Conference and Workshops.

[50]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[51]  Jordi Torres,et al.  Reducing wasted resources to help achieve green data centers , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[52]  Tajana Simunic,et al.  Dynamic voltage frequency scaling for multi-tasking systems using online learning , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[53]  Paul M. Carpenter,et al.  Large-Memory Nodes for Energy Efficient High-Performance Computing , 2016, MEMSYS.

[54]  Gerald Tesauro,et al.  Online Resource Allocation Using Decompositional Reinforcement Learning , 2005, AAAI.

[55]  Mattan Erez,et al.  Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems , 2016, ASPLOS.

[56]  Meeta Sharma Gupta,et al.  Systematic Energy Characterization of CMP/SMT Processor Systems via Automated Micro-Benchmarks , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[57]  Vishakha Gupta,et al.  Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems , 2011, OPSR.

[58]  Margaret Martonosi,et al.  Capping the brown energy consumption of Internet services at low cost , 2010, International Conference on Green Computing.

[59]  Martin Schulz,et al.  Exploring hardware overprovisioning in power-constrained, high performance computing , 2013, ICS '13.

[60]  Laurent Lefèvre,et al.  "Big, Medium, Little": Reaching Energy Proportionality with Heterogeneous Computing Scheduler , 2015, Parallel Process. Lett..

[61]  Ilkka Tuomi,et al.  The Lives and Death of Moore's Law , 2002, First Monday.

[62]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[63]  Christoforos E. Kozyrakis,et al.  Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[64]  Daniel Wong,et al.  KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[65]  Calvin Lin,et al.  A comprehensive approach to DRAM power management , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[66]  Jordi Torres,et al.  GreenSlot: Scheduling energy consumption in green datacenters , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[67]  Manuel Prieto,et al.  Survey of Energy-Cognizant Scheduling Techniques , 2013, IEEE Transactions on Parallel and Distributed Systems.

[68]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[69]  Vijay Janapa Reddi,et al.  Mobile CPU's rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[70]  Norman P. Jouppi,et al.  Core architecture optimization for heterogeneous chip multiprocessors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[71]  Kushagra Vaid,et al.  Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.

[72]  Sherief Reda,et al.  Adaptive Power Capping for Servers with Multithreaded Workloads , 2012, IEEE Micro.

[73]  Daniel Wong,et al.  Peak Efficiency Aware Scheduling for Highly Energy Proportional Servers , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[74]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[75]  Wonho Kim,et al.  Kraken: Leveraging Live Traffic Tests to Identify and Resolve Resource Utilization Bottlenecks in Large Scale Web Services , 2016, OSDI.

[76]  D. Vengerov,et al.  A Methodology for Developing Simple and Robust Power Models Using Performance Monitoring Events , 2009 .

[77]  Christoforos E. Kozyrakis,et al.  Improving Resource Efficiency at Scale with Heracles , 2016, ACM Trans. Comput. Syst..

[78]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[79]  Bin Li,et al.  Dynamo: Facebook's Data Center-Wide Power Management System , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[80]  Jordi Torres,et al.  Energy accounting for shared virtualized environments under DVFS using PMC-based power models , 2012, Future Gener. Comput. Syst..

[81]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[82]  Gu-Yeon Wei,et al.  Profiling a Warehouse-Scale Computer , 2016, IEEE Micro.

[83]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[84]  Daniel Mossé,et al.  Lucky Scheduling for Energy-Efficient Heterogeneous Multi-Core Systems , 2012, HotPower.

[85]  Robert N. M. Watson,et al.  Firmament: Fast, Centralized Cluster Scheduling at Scale , 2016, OSDI.

[86]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[87]  Martin Schulz,et al.  Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[88]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[89]  Ricardo Bianchini,et al.  DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments , 2013, USENIX Annual Technical Conference.

[90]  T. K. Prakash,et al.  Performance Characterization of SPEC CPU 2006 Benchmarks on Intel Core 2 Duo Processor , .

[91]  Sriram Sankar,et al.  The need for speed and stability in data center power capping , 2012, 2012 International Green Computing Conference (IGCC).

[92]  Margaret Martonosi,et al.  Exploring the Potential of CMP Core Count Management on Data Center Energy Savings , 2011 .

[93]  M.K. Patterson,et al.  The effect of data center temperature on energy efficiency , 2008, 2008 11th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems.

[94]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[95]  Manuel Prieto,et al.  A comprehensive scheduler for asymmetric multicore systems , 2010, EuroSys '10.

[96]  Daniel Sánchez,et al.  Rubik: Fast analytical power management for latency-critical systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[97]  Vanish Talwar,et al.  Power Management of Datacenter Workloads Using Per-Core Power Gating , 2009, IEEE Computer Architecture Letters.

[98]  Xiao Zhang,et al.  CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[99]  Patrick Crowley,et al.  Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.

[100]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[101]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[102]  Michael Werner,et al.  Wake-up latencies for processor idle states on current x86 processors , 2014, Computer Science - Research and Development.

[103]  Jason Cong,et al.  Energy-efficient scheduling on heterogeneous multi-core architectures , 2012, ISLPED '12.

[104]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[105]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[106]  Frank Bellosa,et al.  The benefits of event: driven energy accounting in power-sensitive systems , 2000, ACM SIGOPS European Workshop.

[107]  Mohammad Alian,et al.  NCAP: Network-Driven, Packet Context-Aware Power Management for Client-Server Architecture , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[108]  Geoff V. Merrett,et al.  The slowdown or race-to-idle question: Workload-aware energy optimization of SMT multicore platforms under process variation , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[109]  Houman Homayoun,et al.  Managing distributed UPS energy for effective power capping in data centers , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[110]  Rajarshi Das,et al.  A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation , 2006, 2006 IEEE International Conference on Autonomic Computing.

[111]  Eduard Ayguadé,et al.  A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs , 2013, IEEE Transactions on Computers.

[112]  Ananta Tiwari,et al.  Making the Most of SMT in HPC , 2014, ACM Trans. Archit. Code Optim..

[113]  Scott A. Mahlke,et al.  Trace based phase prediction for tightly-coupled heterogeneous cores , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[114]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[115]  Christoforos E. Kozyrakis,et al.  Dynamic management of TurboMode in modern multi-core chips , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[116]  Manuel Prieto,et al.  Maximizing Power Efficiency with Asymmetric Multicore Systems , 2009, ACM Queue.

[117]  Benjamin C. Lee,et al.  Navigating heterogeneous processors with market mechanisms , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[118]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[119]  Karthick Rajamani,et al.  Designing Energy-Efficient Servers and Data Centers , 2010, Computer.

[120]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[121]  Chandra Sekhar Seelamantula,et al.  On the Selection of Optimum Savitzky-Golay Filters , 2013, IEEE Transactions on Signal Processing.

[122]  Seung-Moon Yoo,et al.  A framework for dynamic energy efficiency and temperature management , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[123]  Yale N. Patt,et al.  Predicting Performance Impact of DVFS for Realistic Memory Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[124]  Rami G. Melhem,et al.  Thread Assignment Optimization with Real-Time Performance and Memory Bandwidth Guarantees for Energy-Efficient Heterogeneous Multi-core Systems , 2012, 2012 IEEE 18th Real Time and Embedded Technology and Applications Symposium.

[125]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[126]  Hiroshi Sasaki,et al.  Coordinated power-performance optimization in manycores , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[127]  杨晓娟,et al.  Tales of Tail , 2013 .

[128]  Laurent Lefèvre,et al.  Energy Aware Dynamic Provisioning for Heterogeneous Data Centers , 2016, 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[129]  Tong Li,et al.  Using OS Observations to Improve Performance in Multicore Systems , 2008, IEEE Micro.

[130]  Mor Harchol-Balter,et al.  Optimal power allocation in server farms , 2009, SIGMETRICS '09.

[131]  Yan Cui,et al.  Reducing Shared Cache Contention by Scheduling Order Adjustment on Commodity Multi-cores , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[132]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition , 2013, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.

[133]  Li Zhao,et al.  QuickIA: Exploring heterogeneous architectures on real prototypes , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[134]  Martin Schulz,et al.  Practical performance prediction under Dynamic Voltage Frequency Scaling , 2011, 2011 International Green Computing Conference and Workshops.

[135]  Li Shen,et al.  Implementing a Leading Loads Performance Predictor on Commodity Processors , 2014, USENIX Annual Technical Conference.

[136]  C. Kozyrakis,et al.  Scalable and Efficient Fine-grained Cache Partitioning with Vantage the Vantage Cache-partitioning Technique Enables Configurability and Quality-of-service Guarantees in Large-scale Chip Multiprocessors with Shared Caches. Caches Can Have Hundreds of Partitions with Sizes Specified at Cache Line Gra , 2011 .

[137]  Alexandra Fedorova,et al.  Contention-Aware Scheduling on Multicore Systems , 2010, TOCS.

[138]  Jennifer L. Wong,et al.  To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach , 2013, ASPLOS '13.

[139]  Michael D. Smith,et al.  Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[140]  Xue Liu,et al.  Dynamic Voltage Scaling in Multitier Web Servers with End-to-End Delay Control , 2007, IEEE Transactions on Computers.

[141]  Chandrasekhar Narayanaswami,et al.  PowerNap: an efficient power management scheme for mobile devices , 2006, IEEE Transactions on Mobile Computing.

[142]  Dakai Zhu,et al.  Energy management of standby-sparing systems for fixed-priority real-time workloads , 2013, 2013 International Green Computing Conference Proceedings.

[143]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[144]  Eduard Ayguadé,et al.  Counter-Based Power Modeling Methods: Top-Down vs. Bottom-Up , 2013, Comput. J..

[145]  Nian-Feng Tzeng,et al.  Chaotic attractor prediction for server run-time energy consumption , 2010 .

[146]  Heba Khdr,et al.  Towards performance and reliability-efficient computing in the dark silicon era , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[147]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[148]  Xiaorui Wang,et al.  Power capping: a prelude to power shifting , 2008, Cluster Computing.

[149]  Gang Ren,et al.  Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers , 2010, IEEE Micro.

[150]  Lingjia Tang,et al.  Whare-map: heterogeneity in "homogeneous" warehouse-scale computers , 2013, ISCA.

[151]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[152]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[153]  Mateo Valero,et al.  Supercomputing with commodity CPUs: Are mobile SoCs ready for HPC? , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[154]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[155]  Christina Delimitrou,et al.  Tarcil: reconciling scheduling speed and quality in large shared clusters , 2015, SoCC.

[156]  Christoforos E. Kozyrakis,et al.  Energy proportionality and workload consolidation for latency-critical applications , 2015, SoCC.

[157]  Josep Llosa,et al.  Out-of-order commit processors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[158]  Hua Chen,et al.  Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.