Optimizing Convolution Operations in CUDA with Adaptive Tiling

Convolution operations are essential to signal and image processing applications and are typically responsible for a large fraction of the execution time. Existing approaches for optimizing convolution operations that support a wide range of filter sizes are too limited. In this paper, we present an optimization approach, called adaptive tiling, to implement a highly efficient, yet flexible, convolution operation for modern GPUs. We evaluate the performance of each optimization step on the GTX 480 graphics card and show that adaptive tiling improves performance by 34% on average over differently optimized kernels. To the best of our knowledge, our implementation is the most optimized and best performing implementation of 2D convolution in the spatial domain available to date.

[1]  Victor Podlozhnyuk,et al.  Image Convolution with CUDA , 2007 .

[2]  M. Biehl,et al.  A LOFAR RFI detection pipeline and its first results , 2010, 1007.2089.

[3]  Wen-mei W. Hwu,et al.  CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.

[4]  Wen-mei W. Hwu,et al.  Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.

[5]  Yakup Genc,et al.  GPU-based Video Feature Tracking And Matching , 2006 .

[6]  Yi Yang,et al.  A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.

[7]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[8]  Jack Dongarra,et al.  Computational Science - ICCS 2005, 5th International Conference, Atlanta, GA, USA, May 22-25, 2005, Proceedings, Part I , 2005, International Conference on Computational Science.

[9]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[10]  Michael Biehl,et al.  Post‐correlation radio frequency interference classification methods , 2010, 1002.1957.

[11]  Saeid Belkasim,et al.  Accelerated 2D Image Processing on GPUs , 2005, International Conference on Computational Science.

[12]  Martin Cadík,et al.  FFT and Convolution Performance in Image Filtering on GPU , 2006, Tenth International Conference on Information Visualisation (IV'06).

[13]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.