Algorithmic strategies for optimizing the parallel reduction primitive in CUDA
暂无分享,去创建一个
Roberto Torres | Pedro J. Martín | Luis F. Ayuso | Antonio Gavilanes | Roberto Torres | Antonio Gavilanes
[1] Guy E. Blelloch,et al. Vector Models for Data-Parallel Computing , 1990 .
[2] Pedro J. Martín,et al. CUDA Solutions for the SSSP Problem , 2009, ICCS.
[3] Kun Zhou,et al. Real-time KD-tree construction on graphics hardware , 2008, SIGGRAPH Asia '08.
[4] John D. Owens,et al. A Work-Efficient Step-Efficient Prefix Sum Algorithm , 2006 .
[5] Guy E. Blelloch,et al. Prefix sums and their applications , 1990 .
[6] Michael Garland,et al. Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .
[7] John D. Owens,et al. General Purpose Computation on Graphics Hardware , 2005, IEEE Visualization.
[8] Yao Zhang,et al. Scan primitives for GPU computing , 2007, GH '07.
[9] Wei Wang,et al. Design and Implementation of GPU-Based Prim's Algorithm , 2011 .
[10] Reinhard Klein,et al. GPU‐based Collision Detection for Deformable Parameterized Surfaces , 2006, Comput. Graph. Forum.
[11] Timo Aila,et al. Understanding the efficiency of ray traversal on GPUs , 2009, High Performance Graphics.
[12] Shubhabrata Sengupta,et al. Efficient Parallel Scan Algorithms for GPUs , 2011 .
[13] Brucek Khailany,et al. CudaDMA: Optimizing GPU memory bandwidth via warp specialization , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[14] Naga K. Govindaraju,et al. Fast scan algorithms on graphics processors , 2008, ICS '08.