论文信息 - Coordinated energy management in heterogeneous processors

Coordinated energy management in heterogeneous processors

This paper examines energy management in a heterogeneous processor consisting of an integrated CPU-GPU for high-performance computing (HPC) applications. Energy management for HPC applications is challenged by their uncompromising performance requirements and complicated by the need for coordinating energy management across distinct core types - a new and less understood problem. We examine the intra-node CPU-GPU frequency sensitivity of HPC applications on tightly coupled CPU-GPU architectures as the first step in understanding power and performance optimization for a heterogeneous multi-node HPC system. The insights from this analysis form the basis of a coordinated energy management scheme, called DynaCo, for integrated CPU-GPU architectures. We implement DynaCo on a modern heterogeneous processor and compare its performance to a state-of-the-art power- and performance-management algorithm. DynaCo improves measured average energy-delay squared (ED^2) product by up to 30% with less than 2% average performance loss across several exascale and other HPC workloads.

Sudhakar Yalamanchili | Indrani Paul | Manish Arora | Vignesh T. Ravi | Srilatha Manne

[1] Pradip Bose,et al. Microarchitecture-Level Power-Performance Simulators: Modeling, Validation, and Impact on Design , 2003 .

[2] Hyesoon Kim,et al. TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[3] Margaret Martonosi,et al. Formal online methods for voltage/frequency control in multiple clock domain microprocessors , 2004, ASPLOS XI.

[4] Bronis R. de Supinski,et al. Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[5] Sandia Report,et al. Improving Performance via Mini-applications , 2009 .

[6] Scott B. Baden,et al. Redefining the Role of the CPU in the Era of CPU-GPU Integration , 2012, IEEE Micro.

[7] Bruce Jacob,et al. A control-theoretic approach to dynamic voltage scheduling , 2003, CASES '03.

[8] Gagan Agrawal,et al. Accelerating MapReduce on a coupled CPU-GPU architecture , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[9] Margaret Martonosi,et al. Dynamic-Compiler-Driven Control for Microprocessor Energy and Performance , 2006, IEEE Micro.

[10] Gagan Agrawal,et al. Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.

[11] Lizy Kurian John,et al. Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[12] Jian Li,et al. Dynamic power-performance adaptation of parallel computation on chip multiprocessors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[13] Courtenay T. Vaughan,et al. Energy based performance tuning for large scale high performance computing systems , 2012, HiPC 2012.

[14] Hao Wang,et al. Workload and power budget partitioning for single-chip heterogeneous processors , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[16] Martin Schulz,et al. Practical performance prediction under Dynamic Voltage Frequency Scaling , 2011, 2011 International Green Computing Conference and Workshops.

[17] Nam Sung Kim,et al. Optimizing throughput of power- and thermal-constrained multicore processors using DVFS and per-core power-gating , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[18] Martin Schulz,et al. Bounding energy consumption in large-scale MPI programs , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[19] Gagan Agrawal,et al. A dynamic scheduling framework for emerging heterogeneous systems , 2011, 2011 18th International Conference on High Performance Computing.

[20] Gagan Agrawal,et al. Porting irregular reductions on heterogeneous CPU-GPU configurations , 2011, 2011 18th International Conference on High Performance Computing.

[21] Margaret Martonosi,et al. Stargazer: Automated regression-based GPU design space exploration , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[22] Andrew A. Chien,et al. Abstract: An Exascale Workload Study , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[23] Margaret Martonosi,et al. Computer Architecture Techniques for Power-Efficiency , 2008, Computer Architecture Techniques for Power-Efficiency.

[24] Peng Wang,et al. Implementing molecular dynamics on hybrid high performance computers - short range forces , 2011, Comput. Phys. Commun..

[25] Ian Karlin,et al. LULESH Programming Model and Performance Ports Overview , 2012 .

[26] Lizy Kurian John,et al. Runtime identification of microprocessor energy saving opportunities , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[27] Jia,et al. [IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on Performance Analysis of Systems & Software - Stargazer: Automated regression-based GPU design space exploration , 2012 .

[28] Bronis R. de Supinski,et al. Prediction models for multi-dimensional power-performance optimization on many cores , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[29] Hyesoon Kim,et al. An integrated GPU power and performance model , 2010, ISCA.

[30] Sudhakar Yalamanchili,et al. Cooperative boosting: needy versus greedy power management , 2013, ISCA.

[31] Sudhakar Yalamanchili,et al. Eiger: A framework for the automated synthesis of statistical performance models , 2012, 2012 19th International Conference on High Performance Computing.

[32] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[33] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[34] Andrew A. Chien,et al. Poster: An Exascale Workload Study , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[35] Nam Sung Kim,et al. Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[36] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.