Scan primitives for GPU computing

The scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan, on NVIDIA GPUs using the CUDA API. Using the scan primitives, we show novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyze the performance of the scan primitives, several sort algorithms that use the scan primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.

[1]  Guy E. Blelloch,et al.  Scan primitives for vector computers , 1990, Proceedings SUPERCOMPUTING '90.

[2]  Reinhard Klein,et al.  GPU‐based Collision Detection for Deformable Parameterized Surfaces , 2006, Comput. Graph. Forum.

[3]  Eitan Grinspun,et al.  Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.

[4]  John D. Owens,et al.  A Work-Efficient Step-Efficient Prefix Sum Algorithm , 2006 .

[5]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[6]  John D. Owens,et al.  Interactive Depth of Field Using Simulated Diffusion on a GPU , 2006 .

[7]  Mark S. Peercy,et al.  A performance-oriented data parallel virtual machine for GPUs , 2006, SIGGRAPH '06.

[8]  Rüdiger Westermann,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.

[9]  Anselmo Lastra,et al.  Fast Summed‐Area Table Generation and its Applications , 2005, Comput. Graph. Forum.

[10]  Gavin S. P. Miller,et al.  Rapid, stable fluid dynamics for computer graphics , 1990, SIGGRAPH.

[11]  John D. Owens,et al.  Glift: Generic, efficient, random-access GPU data structures , 2006, TOGS.

[12]  Wolfgang Engel ShaderX2: Shader Programming Tips and Tricks with DirectX 9.0 , 2003 .

[13]  Guy E. Blelloch,et al.  Vector Models for Data-Parallel Computing , 1990 .

[14]  Guy E. Blelloch,et al.  AD-A 270 601 Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors , 1993 .

[15]  Kenneth E. Iverson,et al.  A programming language , 1899, AIEE-IRE '62 (Spring).

[16]  Pat Hanrahan,et al.  Data Parallel Computation on Graphics Hardware , 2003 .

[17]  David K. McAllister,et al.  Fast Matrix Multiplies Using Graphics Hardware , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[18]  Yousef Saad,et al.  Solving Sparse Triangular Linear Systems on Parallel Computers , 1989, Int. J. High Speed Comput..