Optimizing energy consumption in GPUS through feedback-driven CTA scheduling
暂无分享,去创建一个
[1] Mike O'Connor,et al. Cache-Conscious Wavefront Scheduling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[2] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[3] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[4] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[5] Song Huang,et al. On the energy efficiency of graphics processing units for scientific computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[6] Amin Jadidi,et al. MLC PCM main memory with accelerated read , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[7] Hyesoon Kim,et al. An integrated GPU power and performance model , 2010, ISCA.
[8] Amin Jadidi,et al. A morphable phase change memory architecture considering frequent zero values , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).
[9] Jian Li,et al. Power-performance considerations of parallel computing on chip multiprocessors , 2005, TACO.
[10] Amin Jadidi,et al. High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.
[11] Tor M. Aamodt,et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[12] J DallyWilliam,et al. Memory access scheduling , 2000 .
[13] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[14] Nam Sung Kim,et al. Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[15] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[16] Tom R. Halfhill. NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .
[17] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[18] Avi Mendelson,et al. Many-Core vs. Many-Thread Machines: Stay Away From the Valley , 2009, IEEE Computer Architecture Letters.