EPPMiner: An Extended Benchmark Suite for Energy, Power and Performance Characterization of Heterogeneous Architecture

To address the ever-increasing demand for computing capacities, more and more heterogeneous systems have been designed to use both general-purpose and special-purpose processors. On the other hand, the huge energy consumption of these heterogeneous systems raises new environmental concerns and challenges. Besides performance, energy efficiency is now another key factor to be considered by system designers and also consumers. In this paper, we present a benchmark suite EPPMiner for evaluating the performance, power, and energy of different heterogeneous systems. EPPMiner consists of 16 benchmark programs that cover a broad range of application domains, and it shows a great variety in the intensity of utilizing the processors. We have implemented a prototype of EPPMiner that supports OpenMP, CUDA, and OpenCL, and demonstrated its usage by three showcases. Firstly, we use EPPMiner to compare the power efficiency of a set of processors, including two Intel x86 CPUs, two Nvidia GPUs, and one AMD GPU. Secondly, we investigate the impact of multi-threading on the power efficiency of multi-core CPUs. At last, we use EPPMiner to illustrate the effectiveness of GPU Dynamic Voltage and Frequency Scaling (DVFS) on the power efficiency of GPGPU applications. We show that DVFS can improve the energy efficiency by 86% over the default setting on an AMD GPU.

[1]  Hai Liu,et al.  Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems , 2017, e-Energy.

[2]  Jeffrey S. Vetter,et al.  A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..

[3]  Xinxin Mei,et al.  Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.

[4]  Xinxin Mei,et al.  A measurement study of GPU DVFS on energy conservation , 2013, HotPower '13.

[5]  Hai Liu,et al.  Energy efficient real-time task scheduling on CPU-GPU hybrid clusters , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[6]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[7]  Martin Burtscher,et al.  Measuring GPU Power with the K20 Built-in Sensor , 2014, GPGPU@ASPLOS.

[8]  Jack J. Dongarra,et al.  The LINPACK Benchmark: An Explanation , 1988, ICS.

[9]  Nilay Khare,et al.  Analysis of DVFS Techniques for Improving the GPU Energy Efficiency , 2015 .

[10]  Zhongliang Chen,et al.  NUPAR: A Benchmark Suite for Modern GPU Architectures , 2015, ICPE.

[11]  David M. Brooks,et al.  Energy characterization and instruction-level energy model of Intel's Xeon Phi processor , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[12]  Keqin Li,et al.  Energy-Efficient Task Scheduling on Multiple Heterogeneous Computers: Algorithms, Analysis, and Performance Evaluation , 2016, IEEE Transactions on Sustainable Computing.

[13]  Qiang Wang,et al.  HKBU Institutional Repository , 2018 .

[14]  Martin Burtscher,et al.  Energy, Power, and Performance Characterization of GPGPU Benchmark Programs , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[15]  Nuno Roma,et al.  Performance and Power-Aware Classification for Frequency Scaling of GPGPU Applications , 2016, Euro-Par Workshops.

[16]  Neena Imam,et al.  Understanding GPU Power , 2016, ACM Comput. Surv..

[17]  Wen-mei W. Hwu,et al.  Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .

[18]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[19]  Collin McCurdy,et al.  The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.

[20]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[21]  Tomás F. Pena,et al.  Power and Energy Implications of the Number of Threads Used on the Intel Xeon Phi , 2015 .

[22]  Jack J. Dongarra,et al.  The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..

[23]  Shuaiwen Song,et al.  The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[24]  Reena Panda,et al.  Watt Watcher: Fine-Grained Power Estimation for Emerging Workloads , 2015, 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[25]  Sandia Report,et al.  Toward a New Metric for Ranking High Performance Computing Systems , 2013 .