论文信息 - Efficient template matching with variable size templates in CUDA

Efficient template matching with variable size templates in CUDA

Graphics processing units (GPUs) offer significantly higher peak performance than CPUs, but for a limited problem space. Even within this space, GPU solutions are often restricted to a set of specific problem instances or offer greatly varying performance for slightly different parameters. This makes providing a library of GPU implementations that is adaptable to arbitrary inputs a difficult task. This research is motivated by a MATLAB lung tumor tracking application that relies on two-dimensional correlation and uses large template sizes. While GPU-based template matching has been addressed in the past, template sizes were limited to specific, relatively small sizes and not acceptable for accelerating the target application. This paper discusses a CUDA implementation that supports large template sizes and is adaptable to arbitrary template dimensions. The implementation uses on-demand compilation of kernels and compile-time expansion of various kernel parameters to improve the implementation adaptability without sacrificing performance.

Laurie A. Smith King | Miriam Leeser | Nicholas Moore

[1] Steve B. Jiang,et al. Multiple template-based fluoroscopic tracking of lung tumor mass without implanted fiducial markers , 2007, Physics in medicine and biology.

[2] Miriam Leeser,et al. Accelerating a MATLAB Application with Nvidia GPUs: a Case Study for GPU Library Construction , 2009 .

[3] Huiyang Zhou,et al. Accelerating MATLAB Image Processing Toolbox functions on GPUs , 2010, GPGPU-3.