Kill two birds with one stone: Boosting both object detection accuracy and speed with adaptive patch-of-interest composition

Object detection is an important yet challenging task in video understanding & analysis, where one major challenge lies in the proper balance between two contradictive factors: detection accuracy and detection speed. In this paper, we propose a new adaptive patch-of-interest composition approach for boosting both the accuracy and speed for object detection. The proposed approach first extracts patches in a video frame which have the potential to include objects-of-interest. Then, an adaptive composition process is introduced to compose the extracted patches into an optimal number of sub-frames for object detection. With this process, we are able to maintain the resolution of the original frame during object detection (for guaranteeing the accuracy), while minimizing the number of inputs in detection (for boosting the speed). Experimental results on various datasets demonstrate the effectiveness of the proposed approach.

[1]  Xiaogang Wang,et al.  DeepID-Net: Deformable deep convolutional neural networks for object detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Weiyao Lin,et al.  A new Local-Main-Gradient-Orientation HOG and contour differences based algorithm for object classification , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[3]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[4]  Kusum Deep,et al.  A real coded genetic algorithm for solving integer and mixed integer optimization problems , 2009, Appl. Math. Comput..

[5]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[9]  Marc Van Droogenbroeck,et al.  ViBe: A Universal Background Subtraction Algorithm for Video Sequences , 2011, IEEE Transactions on Image Processing.

[10]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Kaveh Amouzgar,et al.  Multi-objective optimization using Genetic Algorithms , 2012 .

[12]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[15]  Jitendra Malik,et al.  Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.