A Survey of Power and Energy Predictive Models in HPC Systems and Applications

Power and energy efficiency are now critical concerns in extreme-scale high-performance scientific computing. Many extreme-scale computing systems today (for example: Top500) have tight integration of multicore CPU processors and accelerators (mix of Graphical Processing Units, Intel Xeon Phis, or Field Programmable Gate Arrays) empowering them to provide not just unprecedented computational power but also to address these concerns. However, such integration renders these systems highly heterogeneous and hierarchical, thereby necessitating design of novel performance, power, and energy models to accurately capture these inherent characteristics. There are now several extensive research efforts focusing exclusively on power and energy efficiency models and techniques for the processors composing these extreme-scale computing systems. This article synthesizes these research efforts with absolute concentration on predictive power and energy models and prime emphasis on node architecture. Through this survey, we also intend to highlight the shortcomings of these models to correctly and comprehensively predict the power and energy consumptions by taking into account the hierarchical and heterogeneous nature of these tightly integrated high-performance computing systems.

[1]  Burton S. Kaliski,et al.  Moore's Law , 2005, Encyclopedia of Cryptography and Security.

[2]  Shirley Moore,et al.  Measuring Energy and Power with PAPI , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[3]  Alexey L. Lastovetsky,et al.  New Model-Based Methods and Algorithms for Performance and Energy Optimization of Data Parallel Applications on Homogeneous Multicore Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.

[4]  Nian-Feng Tzeng,et al.  Run-time Energy Consumption Estimation Based on Workload in Server Systems , 2008, HotPower.

[5]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[6]  Kirk W. Cameron,et al.  Power-aware predictive models of hybrid (MPI/OpenMP) scientific applications on multicore systems , 2012, Computer Science - Research and Development.

[7]  Makoto Taiji,et al.  A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation , 2009, 2009 NASA/ESA Conference on Adaptive Hardware and Systems.

[8]  D. Buell High-Performance Reconfigurable Computing , 2007 .

[9]  Alexander Schill,et al.  Power Consumption Estimation Models for Processors, Virtual Machines, and Servers , 2014, IEEE Transactions on Parallel and Distributed Systems.

[10]  Mahmut T. Kandemir,et al.  Energy-driven integrated hardware-software optimizations using SimplePower , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  Eduard Ayguadé,et al.  Decomposable and responsive power models for multicore processors using performance counters , 2010, ICS '10.

[12]  Frank Bellosa,et al.  The benefits of event: driven energy accounting in power-sensitive systems , 2000, ACM SIGOPS European Workshop.

[13]  Shuaiwen Song,et al.  A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[14]  Robert A. van de Geijn,et al.  BLAS (Basic Linear Algebra Subprograms) , 2011, Encyclopedia of Parallel Computing.

[15]  Lizy Kurian John,et al.  Run-time modeling and estimation of operating system power consumption , 2003, SIGMETRICS '03.

[16]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[17]  John Norton Moore Federalism and Foreign Relations , 1965 .

[18]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[19]  Scott Pakin,et al.  Exploring power behaviors and trade-offs of in-situ data analytics , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  Efraim Rotem,et al.  Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge , 2012, IEEE Micro.

[21]  David H. C. Du,et al.  On the interconnect energy efficiency of high end computing systems , 2013, Sustain. Comput. Informatics Syst..

[22]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[23]  Jack J. Dongarra,et al.  Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures , 2012, 2012 Second International Conference on Cloud and Green Computing.

[24]  James Demmel,et al.  Perfect Strong Scaling Using No Additional Energy , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[25]  John D. Davis,et al.  BLAS Comparison on FPGA, CPU and GPU , 2010, 2010 IEEE Computer Society Annual Symposium on VLSI.

[26]  Jeffrey S. Vetter,et al.  A Survey of Methods for Analyzing and Improving GPU Energy Efficiency , 2014, ACM Comput. Surv..

[27]  Richard W. Vuduc,et al.  Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[28]  G. D. Peterson,et al.  Power Aware Computing on GPUs , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.

[29]  Greg Brown,et al.  A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors , 2013, TACO.

[30]  Alejandro Duran,et al.  The Intel® Many Integrated Core Architecture , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).

[31]  Bin Li,et al.  Statistical GPU power analysis using tree-based methods , 2011, 2011 International Green Computing Conference and Workshops.

[32]  Viktor K. Prasanna,et al.  Rapid energy estimation of computations on FPGA based soft processors , 2004, IEEE International SOC Conference, 2004. Proceedings..

[33]  Franck Cappello,et al.  Grid'5000: a large scale and highly reconfigurable grid experimental testbed , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[34]  J. Węglarz,et al.  Runtime power usage estimation of HPC servers for various classes of real-life applications , 2014, Future Gener. Comput. Syst..

[35]  George Bosilca,et al.  Power profiling of Cholesky and QR factorizations on distributed memory systems , 2012, Computer Science - Research and Development.

[36]  Stefanos Kaxiras,et al.  Green governors: A framework for Continuously Adaptive DVFS , 2011, 2011 International Green Computing Conference and Workshops.

[37]  Christoforos E. Kozyrakis,et al.  A Comparison of High-Level Full-System Power Models , 2008, HotPower.

[38]  Boyana Norris,et al.  A component infrastructure for performance and power modeling of parallel scientific applications , 2008, CBHPC '08.

[39]  Mahmut T. Kandemir,et al.  Using complete machine simulation for software power estimation: the SoftWatt approach , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[40]  Maciej Drozdowski,et al.  Time and Energy Performance of Parallel Systems with Hierarchical Memory , 2015, Journal of Grid Computing.

[41]  Wayne Luk,et al.  Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing , 2010, 2010 International Conference on Field-Programmable Technology.

[42]  John Shalf,et al.  Power efficiency in high performance computing , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[43]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[44]  Laurent Lefèvre,et al.  A survey on techniques for improving the energy efficiency of large-scale distributed systems , 2014, ACM Comput. Surv..

[45]  Yongxin Zhu,et al.  An accurate power model for GPU processors , 2012, 2012 7th International Conference on Computing and Convergence Technology (ICCCT).

[46]  Pavan Balaji Compute Unified Device Architecture , 2015 .

[47]  Hermann de Meer,et al.  Evaluating and modeling power consumption of multi-core processors , 2012, 2012 Third International Conference on Future Systems: Where Energy, Computing and Communication Meet (e-Energy).

[48]  Jeffrey S. Vetter,et al.  A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..

[49]  David M. Brooks,et al.  Energy characterization and instruction-level energy model of Intel's Xeon Phi processor , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[50]  Lizy Kurian John,et al.  Complete System Power Estimation Using Processor Performance Events , 2012, IEEE Transactions on Computers.

[51]  Andreas Koch,et al.  Acceleration and Energy Efficiency of a Geometric Algebra Computation using Reconfigurable Computers and GPUs , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[52]  Feng Zhao,et al.  Fine-grained energy profiling for power-aware application design , 2008, PERV.

[53]  Sally A. McKee,et al.  Portable, scalable, per-core power estimation for intelligent resource management , 2010, International Conference on Green Computing.

[54]  Satoshi Matsuoka,et al.  Statistical power modeling of GPU kernels using performance counters , 2010, International Conference on Green Computing.

[55]  Bingsheng He,et al.  Distributed Systems Meet Economics: Pricing in the Cloud , 2010, HotCloud.

[56]  Margaret H. Wright,et al.  The opportunities and challenges of exascale computing , 2010 .

[57]  Yuan Xie,et al.  Optimizing GPU energy efficiency with 3D die-stacking graphics memory and reconfigurable memory interface , 2013, TACO.

[58]  Samar Abdi,et al.  Operand-Value-Based Modeling of Dynamic Energy Consumption of Soft Processors in FPGA , 2015, ARC.

[59]  Mario A. R. Dantas,et al.  A survey into performance and energy efficiency in HPC, cloud and big data environments , 2014, Int. J. Netw. Virtual Organisations.

[60]  Rajesh Gupta,et al.  Evaluating the effectiveness of model-based power characterization , 2011 .

[61]  Yong Dou,et al.  Optimization schemes and performance evaluation of Smith–Waterman algorithm on CPU, GPU and FPGA , 2012, Concurr. Comput. Pract. Exp..

[62]  Wayne Luk,et al.  A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation , 2009, FPGA '09.

[63]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[64]  Xingjian Li,et al.  Floating-point mixed-radix FFT core generation for FPGA and comparison with GPU and CPU , 2011, 2011 International Conference on Field-Programmable Technology.

[65]  Sotirios G. Ziavras,et al.  System-Level Energy Modeling for Heterogeneous Reconfigurable Chip Multiprocessors , 2006, 2006 International Conference on Computer Design.

[66]  무어 리차드에이.,et al.  Adaptive voltage scaling , 2010 .

[67]  Richard W. Vuduc,et al.  A Roofline Model of Energy , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[68]  Li-Shiuan Peh,et al.  Exploring the Design Space of Self-Regulating Power-Aware On/Off Interconnection Networks , 2007, IEEE Transactions on Parallel and Distributed Systems.

[69]  Matthias S. Müller,et al.  Characterizing the energy consumption of data transfers and arithmetic operations on x86−64 processors , 2010, International Conference on Green Computing.

[70]  Haifeng Wang,et al.  Predicting power consumption of GPUs with fuzzy wavelet neural networks , 2015, Parallel Comput..

[71]  Suzanne Rivoire,et al.  Models and metrics for energy-efficient computer systems , 2008 .

[72]  Christos Kozyrakis,et al.  Full-System Power Analysis and Modeling for Server Environments , 2006 .

[73]  Laurent Lefèvre,et al.  Energy estimation for MPI broadcasting algorithms in large scale HPC systems , 2013, EuroMPI.

[74]  Maciej Drozdowski,et al.  Energy trade-offs analysis using equal-energy maps , 2014, Future Gener. Comput. Syst..

[75]  Li Shang,et al.  Dynamic voltage scaling with links for power optimization of interconnection networks , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[76]  Atri Rudra,et al.  An energy complexity model for algorithms , 2013, ITCS '13.

[77]  Gokcen Kestor,et al.  Quantifying the energy cost of data movement in scientific applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[78]  Shrirang M. Yardi,et al.  CAMP: A technique to estimate per-structure power at run-time using a few simple parameters , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[79]  Dong Li,et al.  PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications , 2010, IEEE Transactions on Parallel and Distributed Systems.

[80]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[81]  Surendra Byna,et al.  Energy-Aware Workload Consolidation on GPU , 2011, 2011 40th International Conference on Parallel Processing Workshops.

[82]  Sudhakar Yalamanchili,et al.  Power Modeling for GPU Architectures Using McPAT , 2014, TODE.

[83]  Eduard Ayguadé,et al.  A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs , 2013, IEEE Transactions on Computers.

[84]  Giovanni Giuliani,et al.  A methodology to predict the power consumption of servers in data centres , 2011, e-Energy.

[85]  Jack J. Dongarra,et al.  Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency , 2012, Computer Science - Research and Development.

[86]  Francisco J. Cazorla,et al.  Hardware support for accurate per-task energy metering in multicore systems , 2013, TACO.

[87]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[88]  Eduardo Ros,et al.  A Comparison of FPGA and GPU for Real-Time Phase-Based Optical Flow, Stereo, and Local Image Features , 2012, IEEE Transactions on Computers.

[89]  Waltenegus Dargie,et al.  A Stochastic Model for Estimating the Power Consumption of a Processor , 2015, IEEE Transactions on Computers.

[90]  Keqin Li Optimal Partitioning of a Multicore Server Processor , 2012, IPDPS Workshops.

[91]  Margaret Martonosi,et al.  Power-Performance Modeling and Tradeoff Analysis for a High End Microprocessor , 2000, PACS.

[92]  Majid Sarrafzadeh,et al.  Energy-aware high performance computing with graphic processing units , 2008, CLUSTER 2008.

[93]  Wei Wu,et al.  A systematic method for functional unit power estimation in microprocessors , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[94]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[95]  Karsten Schwan,et al.  A framework for dynamically instrumenting GPU compute applications within GPU Ocelot , 2011, GPGPU-4.

[96]  Gokcen Kestor,et al.  Enabling accurate power profiling of HPC applications on exascale systems , 2013, ROSS '13.

[97]  Ying Liu,et al.  High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP , 2012, Int. J. Reconfigurable Comput..

[98]  Kirk W. Cameron,et al.  E-AMOM: an energy-aware modeling and optimization methodology for scientific applications , 2014, Computer Science - Research and Development.

[99]  Bernd Mohr,et al.  Modeling CPU Energy Consumption of HPC Applications on the IBM POWER7 , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[100]  Margaret Martonosi,et al.  Computer Architecture Techniques for Power-Efficiency , 2008, Computer Architecture Techniques for Power-Efficiency.

[101]  Wu-chun Feng,et al.  Statistical Power and Performance Modeling for Optimizing the Energy Efficiency of Scientific Computing , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.

[102]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[103]  Ami Marowka Analytical modeling of energy efficiency in heterogeneous processors , 2013, Comput. Electr. Eng..

[104]  Steven J. E. Wilton,et al.  A detailed power model for field-programmable gate arrays , 2005, TODE.

[105]  Jürgen Becker,et al.  Comparison of processing performance and architectural efficiency metrics for FPGAs and GPUs in 3D Ultrasound Computer Tomography , 2012, 2012 International Conference on Reconfigurable Computing and FPGAs.

[106]  Rajeev Thakur,et al.  Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.

[107]  Huseyin Seker,et al.  Highly Parameterized K-means Clustering on FPGAs: Comparative Results with GPPs and GPUs , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.

[108]  Bin Li,et al.  Performance and Power Analysis of ATI GPU: A Statistical Approach , 2011, 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage.

[109]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[110]  Michael Liebelt,et al.  Dynamic Voltage Scaling , 2001 .

[111]  Jung Ho Ahn,et al.  The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing , 2013, TACO.

[112]  Ricardo Bianchini,et al.  Energy conservation in heterogeneous server clusters , 2005, PPoPP.

[113]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[114]  Zizhong Chen,et al.  A survey of power and energy efficient techniques for high performance numerical linear algebra operations , 2014, Parallel Comput..

[115]  Frank Kienle,et al.  An Energy Efficient FPGA Accelerator for Monte Carlo Option Pricing with the Heston Model , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.

[116]  Nuno Pereira,et al.  Energy-Efficiency in Data Centers , 2013 .

[117]  Debasish Ghose,et al.  Divisible Load Theory: A New Paradigm for Load Scheduling in Distributed Systems , 2004, Cluster Computing.

[118]  FengWu-chun,et al.  The Green500 List , 2007 .

[119]  Shajulin Benedict,et al.  Energy-aware performance analysis methodologies for HPC architectures - An exploratory study , 2012, J. Netw. Comput. Appl..

[120]  Jan Weglarz,et al.  Practical power consumption estimation for real life HPC applications , 2013, Future Gener. Comput. Syst..

[121]  Ananta Tiwari,et al.  Modeling Power and Energy Usage of HPC Kernels , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[122]  Weisong Shi,et al.  CPT : An Energy-Efficiency Model for Multi-core Computer Systems , 2013 .

[123]  Maya Gokhale,et al.  Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA? , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[124]  Ki Hwan Yum,et al.  Adaptive data compression for high-performance low-power on-chip networks , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[125]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[126]  Shuaiwen Song,et al.  The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[127]  Carole-Jean Wu,et al.  Quantifying the energy cost of data movement for emerging smart phone workloads on mobile platforms , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[128]  Mikko Majanen,et al.  Energy-aware job scheduler for high-performance computing , 2012, Computer Science - Research and Development.

[129]  Yonggang Wen,et al.  Data Center Energy Consumption Modeling: A Survey , 2016, IEEE Communications Surveys & Tutorials.

[130]  Daniel Bedard,et al.  PowerMon: Fine-grained and integrated power monitoring for commodity computer systems , 2010, Proceedings of the IEEE SoutheastCon 2010 (SoutheastCon).

[131]  Xiaohan Ma,et al.  Statistical Power Consumption Analysis and Modeling for GPU-based Computing , 2011 .