论文信息 - Execution Units Power-Gating to Improve Energy Efficiency of GPGPUs

Execution Units Power-Gating to Improve Energy Efficiency of GPGPUs

In this paper, we examine the distribution and the length of execution units idle cycles for several typical GPGPU applications to direct the energy-saving strategies to capture potential execution units power-gating opportunities. We record the idle durations of the execution units for SMs (Streaming Multiprocessors) including integer units and floating units in SPs (Streaming Processors) and SFUs (Special Function Units). Second, based on the observation of idleness, we study the effectiveness of the execution units power-gating on the leakage energy saving with two simple policies, the immediate power-gating (IPG) and idle detect power-gating (ID-PG). We examine the polices with various parameter settings in order to offer insights on possible gains and losses from the power-gating techniques to enable smarter strategies in future research. The experimental results show that both policies can achieve satisfactory leakage energy saving on execution units. The immediate power-gating can reduct the execution units leakage energy by 84.3% when the break even time is set to 5 cycles and the idle detect power-gating can save 67.1% of the total execution units leakage energy even if the break even time goes up to 20 cycles.

Wei Zhang | Xin Wang

[1] Murali Annavaram,et al. PATS: Pattern aware scheduling and power gating for GPGPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[2] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[3] Nam Sung Kim,et al. Power-efficient computing for compute-intensive GPGPU applications , 2013, HPCA.

[4] Tor M. Aamodt,et al. Thread block compaction for efficient SIMT control flow , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[5] Tom R. Halfhill. NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .

[6] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[7] Sun UltraSPARC,et al. A closer look at GPUs , 2008, Commun. ACM.

[8] Mattan Erez,et al. Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation , 2013, ISCA.

[9] Pradip Bose,et al. Microarchitectural techniques for power gating of execution units , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[10] Yue Wang,et al. Run-time power-gating in caches of GPUs for leakage energy savings , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11] Wei Zhang,et al. Drowsy Register Files for Reducing GPU Leakage Energy , 2017, 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS).

[12] Mohammad Abdel-Majeed,et al. Warped gates: Gating aware scheduling and power gating for GPGPUs , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13] Chia-Lin Yang,et al. Power gating strategies on GPUs , 2011, TACO.

[14] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[15] Mohammad Abdel-Majeed,et al. Warped register file: A power efficient register file for GPGPUs , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[16] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[17] Kevin Skadron,et al. A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[18] Tor M. Aamodt,et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[19] Kevin Skadron,et al. Dynamic warp subdivision for integrated branch and memory divergence tolerance , 2010, ISCA.

[20] Mike Houston,et al. A closer look at GPUs , 2008, Commun. ACM.

[21] Mattan Erez,et al. A locality-aware memory hierarchy for energy-efficient GPU architectures , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22] Pushpak Karnick. GPGPU : General Purpose Computing on Graphics Hardware Pushpak Karnick , 2007 .

[23] William J. Dally,et al. Energy-efficient mechanisms for managing thread context in throughput processors , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).