Accelerating convolution-based detection model on GPU

Convolution-based detection models (CDM) have achieved tremendous success in computer vision in last few years, such as deformable part-based models (DPM) and convolutional neural networks (CNN). The simplicity of these models allows for very large scale training to achieve higher robustness and recognition performance. However, the main bottleneck of those powerful state-of-the-art models is the unacceptable computational cost of the convolution in model training and evaluation, which has become a major limitation in many practical applications. In this paper, we accelerate the convolution-based detection models with the mathematic and parallel techniques. On one hand, the convolution operation in the spatial space is converted to the dot product operation in the frequency domain for less computational cost. On the other hand, the data and tasks parallelized on graphical process units (GPU) reduce the computational time further. Experimental results on the public dataset Pascal VOC demonstrate that we can use commodity GPU to speed up the whole convolution process by 2.13x to 4.31x, compared to the multithreaded implementation on CPU.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[4]  Tinne Tuytelaars,et al.  Pedestrian Detection at Warp Speed: Exceeding 500 Detections per Second , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[5]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[6]  Naga K. Govindaraju,et al.  Auto-tuning of fast fourier transform on graphics processors , 2011, PPoPP '11.

[7]  Shinpei Kato,et al.  GPU implementations of object detection using HOG features and deformable models , 2013, 2013 IEEE 1st International Conference on Cyber-Physical Systems, Networks, and Applications (CPSNA).

[8]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Trevor Darrell,et al.  Sparselet Models for Efficient Multiclass Object Detection , 2012, ECCV.

[12]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[13]  Hubert Cecotti,et al.  Convolutional Neural Network with embedded Fourier Transform for EEG classification , 2008, 2008 19th International Conference on Pattern Recognition.

[14]  Toon Goedemé,et al.  Is the Game worth the Candle? - Evaluation of OpenCL for Object Detection Algorithm Optimization , 2012, PECCS.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  François Fleuret,et al.  Exact Acceleration of Linear Object Detectors , 2012, ECCV.

[18]  Bernard Chazelle,et al.  The Bottomn-Left Bin-Packing Heuristic: An Efficient Implementation , 1983, IEEE Transactions on Computers.

[19]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Yiqun Liu,et al.  MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs , 2013, Journal of Computer Science and Technology.

[21]  Jitendra Malik,et al.  Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).