High-performance signal processing on emerging many-core architectures using cuda

This paper provides a short introduction to CUDA programming paradigm and recent standardization efforts. We also present a new 2D Fast Wavelet Transform implementation on GPUs using CUDA. Our novel implementation achieves almost two orders of magnitude improvements versus a typical quad-core CPU, and demonstrates that emerging manycore architectures are ideal platforms for achieving highperformance signal processing.

[1]  Francisco Tirado,et al.  Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting , 2008, IEEE Transactions on Parallel and Distributed Systems.

[2]  Andrew Chi-Sing Leung,et al.  Discrete Wavelet Transform on Consumer-Level Graphics Hardware , 2007, IEEE Transactions on Multimedia.

[3]  Han-Wei Shen,et al.  GPU-based 3D wavelet reconstruction with tileboarding , 2005, The Visual Computer.

[4]  Satoshi Matsuoka,et al.  Bandwidth intensive 3-D FFT kernel for GPUs using CUDA , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[6]  Naga K. Govindaraju,et al.  High performance discrete Fourier transforms on graphics processors , 2008, HiPC 2008.

[7]  Kenneth Moreland,et al.  The FFT on a GPU , 2003, HWWS '03.