Balancing Scalar and Vector Execution on GPU Architectures
暂无分享,去创建一个
[1] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[2] Mike Murphy,et al. Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs , 2010, CGO '10.
[3] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[5] Nam Sung Kim,et al. Power-efficient computing for compute-intensive GPGPU applications , 2013, HPCA.
[6] Yi Yang,et al. Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement , 2013, ICS '13.
[7] Zhongliang Chen,et al. Scalar Waving: Improving the Efficiency of SIMD Execution on GPUs , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[8] Sylvain Collange,et al. Identifying scalar behavior in CUDA kernels , 2011 .
[9] Zhongliang Chen,et al. Characterizing scalar opportunities in GPGPU applications , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[10] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).
[11] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[12] David Kaeli,et al. Heterogeneous Computing with OpenCL 2.0 , 2015 .
[13] Fernando Magno Quintão Pereira,et al. Divergence Analysis and Optimizations , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[14] Onur Mutlu,et al. Improving GPU performance via large warps and two-level warp scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[15] Qunfeng Dong,et al. A Case for a Flexible Scalar Unit in SIMT Architecture , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[16] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[17] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[18] Yao Zhang,et al. Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations , 2009, Euro-Par Workshops.
[19] Krste Asanovic,et al. Convergence and scalarization for data-parallel architectures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).