Reducing branch divergence in GPU programs
暂无分享,去创建一个
[1] Tor M. Aamodt,et al. Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware , 2009, TACO.
[2] Michael Wolfe,et al. Implementing the PGI Accelerator model , 2010, GPGPU-3.
[3] Lothar Lilge,et al. GPU-accelerated Monte Carlo simulation for photodynamic therapy treatment planning , 2009, European Conference on Biomedical Optics.
[4] Xipeng Shen,et al. Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping , 2010, ICS '10.
[5] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[6] David R. Kaeli,et al. Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.
[7] Kevin Skadron,et al. Dynamic warp subdivision for integrated branch and memory divergence tolerance , 2010, ISCA.
[8] Benoît Meister,et al. A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction , 2010, GPGPU-3.
[9] Bjorn De Sutter,et al. Compiler techniques for code compaction , 2000, TOPL.
[10] Xiaoming Li,et al. A control-structure splitting optimization for GPGPU , 2009, CF '09.
[11] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[12] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[13] J. Ramanujam,et al. Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.
[14] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[15] Wen-mei W. Hwu,et al. CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.
[16] Wen-mei W. Hwu,et al. Program optimization carving for GPU computing , 2008, J. Parallel Distributed Comput..