StreamScan: fast scan algorithms for GPUs without global barrier synchronization
暂无分享,去创建一个
[1] Philippas Tsigas,et al. A Practical Quicksort Algorithm for Graphics Processors , 2008, ESA.
[2] Jianliang Xu,et al. GPURoofline: A Model for Guiding Performance Optimizations on GPUs , 2012, Euro-Par.
[3] A. Grimshaw,et al. High Performance and Scalable Radix Sorting: a Case Study of Implementing Dynamic Parallelism for GPU Computing , 2011, Parallel Process. Lett..
[4] Nan Zhang. A Novel Parallel Scan for Multicore Processors and Its Application in Sparse Matrix-Vector Multiplication , 2012, IEEE Transactions on Parallel and Distributed Systems.
[5] John D. Owens,et al. A Work-Efficient Step-Efficient Prefix Sum Algorithm , 2006 .
[6] Michael Garland,et al. Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[7] Shubhabrata Sengupta,et al. Efficient Parallel Scan Algorithms for GPUs , 2011 .
[8] Philippas Tsigas,et al. On sorting and load balancing on GPUs , 2009, CARN.
[9] Norbert Luttenberger,et al. Fast In-Place Sorting with CUDA Based on Bitonic Sort , 2009, PPAM.
[10] Wu-chun Feng,et al. Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[11] H. T. Kung,et al. A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.
[12] Philippas Tsigas,et al. GPU-Quicksort: A practical Quicksort algorithm for graphics processors , 2010, JEAL.
[13] Ulf Assarsson,et al. Efficient stream compaction on wide SIMD many-core architectures , 2009, High Performance Graphics.
[14] Zheng Wei,et al. Optimization of linked list prefix computations on multithreaded GPUs using CUDA , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[15] Naga K. Govindaraju,et al. Fast scan algorithms on graphics processors , 2008, ICS '08.
[16] Andrew S. Grimshaw,et al. Allocation-oriented algorithm design with application to gpu computing , 2011 .
[17] Guy E. Blelloch,et al. Scans as Primitive Parallel Operations , 1989, ICPP.
[18] Harold S. Stone,et al. A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.
[19] Mark J. Harris,et al. Parallel Prefix Sum (Scan) with CUDA , 2011 .
[20] Andrew S. Grimshaw,et al. Parallel Scan for Stream Architectures , 2012 .
[21] Jens Breitbart. Static GPU Threads and an Improved Scan Algorithm , 2010, Euro-Par Workshops.
[22] P J Narayanan,et al. Fast minimum spanning tree for large graphs on the GPU , 2009, High Performance Graphics.
[23] Kenneth E. Iverson,et al. A programming language , 1899, AIEE-IRE '62 (Spring).
[24] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[25] Guy E. Blelloch,et al. Prefix sums and their applications , 1990 .
[26] Andrew S. Grimshaw,et al. Revisiting sorting for GPGPU stream architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).