A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection

Target detection is a hard real-time task for video and image processing. This task has recently been accomplished through the feedforward process of convolutional neural networks (CNN), which is usually accelerated by general-purpose graphic units (GPUs). However, there are two challenges for this task. One is that the running speed remains to be improved. The other is that we probably use a deeper and larger CNN model, but a more sophisticated model may not be trained well due to the shortage of GPU memory. In this paper, we present two scheduling algorithms to solve the aforementioned challenges for improving the system performance holistically. The first one is an efficient image combination algorithm used to accelerate the feedforward process of CNN. The other is a light-memory-cost algorithm used to train an arbitrarily large CNN model for a GPU device with a limited memory. We run our experiments on a GTX980 card and use a CNN model with 8GB of model parameters, which is larger than the size of the global memory of a GPU. Compared with that of cuDNNv3, a high speedup of 6.97x is obtained in the detection task.

[1]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[5]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Yong Dou,et al.  Classification of Hyperspectral Remote Sensing Image Using Hierarchical Local-Receptive-Field-Based Extreme Learning Machine , 2016, IEEE Geoscience and Remote Sensing Letters.

[9]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[10]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[11]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[12]  Yann LeCun,et al.  Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[13]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Andrew Lavin,et al.  maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs , 2015, ArXiv.

[15]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[16]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[20]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[21]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.