GPUWattch: enabling energy optimizations in GPGPUs
暂无分享,去创建一个
Nam Sung Kim | Jingwen Leng | Vijay Janapa Reddi | Tor M. Aamodt | Tayler H. Hetherington | Syed Zohaib Gilani | Ahmed ElTantawy | V. Reddi | N. Kim | S. Gilani | Jingwen Leng | Ahmed Eltantawy
[1] Tor M. Aamodt,et al. Thread block compaction for efficient SIMT control flow , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[2] Satoshi Matsuoka,et al. Statistical power modeling of GPU kernels using performance counters , 2010, International Conference on Green Computing.
[3] William J. Dally. Moving the needle, computer architecture research in academe and industry , 2010, ISCA '10.
[4] Onur Mutlu,et al. Improving GPU performance via large warps and two-level warp scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[5] Bin Li,et al. Performance and Power Analysis of ATI GPU: A Statistical Approach , 2011, 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage.
[6] Jung Ho Ahn,et al. A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.
[7] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[8] S.H. Dhong,et al. A fully pipelined single-precision floating-point unit in the synergistic processor element of a CELL processor , 2006, IEEE Journal of Solid-State Circuits.
[9] Thomas Vogelsang,et al. Understanding the Energy Consumption of Dynamic Random Access Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[10] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[11] Tor M. Aamodt,et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[12] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[13] Pradip Bose,et al. Stretching the limits of clock-gating efficiency in server-class processors , 2005, 11th International Symposium on High-Performance Computer Architecture.
[14] David R. Kaeli,et al. Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[15] G. D. Peterson,et al. Power Aware Computing on GPUs , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.
[16] Yiran Chen,et al. Deterministic clock gating for microprocessor power reduction , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[17] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] Margaret Martonosi,et al. Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[19] Margaret Martonosi,et al. A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[20] Nam Sung Kim,et al. Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[21] Meeta Sharma Gupta,et al. System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[22] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[23] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[24] Arnaud Tisserand,et al. Power Consumption of GPUs from a Software Perspective , 2009, ICCS.
[25] Nam Sung Kim,et al. Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[26] Hyesoon Kim,et al. An integrated GPU power and performance model , 2010, ISCA.
[27] Haifeng Wang,et al. Power Estimating Model and Analysis of General Programming on GPU , 2012, J. Softw..
[28] Brucek Khailany,et al. CudaDMA: Optimizing GPU memory bandwidth via warp specialization , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).