REPP-H: Runtime Estimation of Power and Performance on Heterogeneous Data Centers

One of the main challenges in data center systems is operating under certain Quality of Service (QoS) while minimizing power consumption. Increasingly, data centers are adopting heterogeneous server architectures with different power-performance trade-offs. This requires careful understanding of the application behavior across multiple architectures at runtime so as to enable meeting specified power and performance requirements. In this work, we present and evaluate REPP-H (Runtime Estimation of Performance and Power on Heterogeneous data centers). REPP-H leverages hardware performance counters available on all major server architectures to ensure a highly responsive power capping mechanism and delivering a minimum performance in a single step. We experimentally show that REPP-H can successfully estimate power and performance of several single-threaded andmultiprogrammed workloads. The average errors on ARM, AMD and Intel architectures are, respectively, 7.1%, 9.0%, 7.1% when predicting performance, and 6.0%, 6.5%, 8.1% when predicting power on those heterogeneous servers.

[1]  Jennifer L. Wong,et al.  To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach , 2013, ASPLOS '13.

[2]  Lingjia Tang,et al.  Heterogeneity in “Homogeneous” Warehouse-Scale Computers: A Performance Opportunity , 2011, IEEE Computer Architecture Letters.

[3]  Yale N. Patt,et al.  Predicting Performance Impact of DVFS for Realistic Memory Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[4]  Sherief Reda,et al.  Pack & Cap: Adaptive DVFS and thread packing under power caps , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Rajiv Nishtala,et al.  A Methodology to Build Models and Predict Performance-Power in CMPs , 2015, 2015 44th International Conference on Parallel Processing Workshops.

[6]  Nian-Feng Tzeng,et al.  Chaotic attractor prediction for server run-time energy consumption , 2010 .

[7]  Daniel Mossé,et al.  Energy-aware thread co-location in heterogeneous multicore processors , 2013, 2013 Proceedings of the International Conference on Embedded Software (EMSOFT).

[8]  Li Shen,et al.  PPEP: Online Performance, Power, and Energy Prediction Framework and DVFS Space Exploration , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[9]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[11]  Eduard Ayguadé,et al.  PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite , 2016, ACM Trans. Archit. Code Optim..

[12]  Margaret Martonosi,et al.  Phase characterization for power: evaluating control-flow-based and event-counter-based techniques , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[13]  Martin Schulz,et al.  Practical performance prediction under Dynamic Voltage Frequency Scaling , 2011, 2011 International Green Computing Conference and Workshops.

[14]  Li Shen,et al.  Implementing a Leading Loads Performance Predictor on Commodity Processors , 2014, USENIX Annual Technical Conference.

[15]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[16]  T. K. Prakash,et al.  Performance Characterization of SPEC CPU 2006 Benchmarks on Intel Core 2 Duo Processor , .

[17]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[19]  Frank Bellosa,et al.  The benefits of event: driven energy accounting in power-sensitive systems , 2000, ACM SIGOPS European Workshop.

[20]  Sergey Blagodurov,et al.  Addressing shared resource contention in datacenter servers , 2013 .

[21]  Christoforos E. Kozyrakis,et al.  Vantage: Scalable and efficient fine-grain cache partitioning , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[22]  Eduard Ayguadé,et al.  A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs , 2013, IEEE Transactions on Computers.

[23]  Ripal Nathuji,et al.  Exploiting Platform Heterogeneity for Power Efficient Data Centers , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).

[24]  Sadagopan Srinivasan,et al.  Efficient interaction between OS and architecture in heterogeneous platforms , 2011, OPSR.

[25]  Rajesh Gupta,et al.  Evaluating the effectiveness of model-based power characterization , 2011 .

[26]  Manuel Prieto,et al.  Survey of scheduling techniques for addressing shared resources in multicore processors , 2012, CSUR.

[27]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.