Adaptive Deep Convolutional Neural Networks for Scene-Specific Object Detection

A deep convolutional neural network (CNN) becomes a widely used tool for object detection. Many previous works have achieved excellent performance on object detection benchmarks. However, these works present generic detectors whose performance will drop rapidly when they are applied to a surveillance scene. In this paper, we propose an efficient method to construct a scene-specific regression model based on a generic CNN-based classifier. Our regression model is an adaptive deep CNN (ADCNN), which can predict object locations in the surveillance scene. First, we transfer the generic CNN-based classifier to the surveillance scene by selecting useful kernels. Second, we learn the context information of the surveillance scene in our regression model for accurate location prediction. Our main contributions are: 1) a transfer learning method that selects useful kernels in the generic CNN-based classifier; 2) a special architecture that can effectively learn the local and global context information in the surveillance scene; and 3) a new objective function to effectively train parameters in ADCNN. Compared with some state-of-the-art models, ADCNN achieves the best performance on three surveillance data sets for pedestrian detection and one surveillance data set for vehicle detection.

[1]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[2]  Shih-Fu Chang,et al.  Cross-domain learning methods for high-level visual concept classification , 2008, 2008 15th IEEE International Conference on Image Processing.

[3]  Xiaogang Wang,et al.  DeepID-Net: Deformable deep convolutional neural networks for object detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Xiaogang Wang,et al.  Multi-stage Contextual Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Kongqiao Wang,et al.  Distributed Object Detection With Linear SVMs , 2014, IEEE Transactions on Cybernetics.

[6]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Meng Wang,et al.  Transferring a generic pedestrian detector towards specific scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Vinod Nair,et al.  An unsupervised, online learning framework for moving object detection , 2004, CVPR 2004.

[11]  Meng Wang,et al.  Automatic adaptation of a generic pedestrian detector to a specific traffic scene , 2011, CVPR 2011.

[12]  Dumitru Erhan,et al.  Scalable, High-Quality Object Detection , 2014, ArXiv.

[13]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[14]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[15]  Yuting Zhang,et al.  Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jitendra Malik,et al.  Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Qi Wang,et al.  Embedding structured contour and location prior in siamesed fully convolutional networks for road detection , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Bernardete Ribeiro,et al.  Improving the Generalization Capacity of Cascade Classifiers , 2013, IEEE Transactions on Cybernetics.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  David Vázquez,et al.  Occlusion Handling via Random Subspace Classifiers for Human Detection , 2014, IEEE Transactions on Cybernetics.

[22]  Li Wan,et al.  End-to-end integration of a Convolutional Network, Deformable Parts Model and non-maximum suppression , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[24]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[25]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Qi Wang,et al.  An Incremental Framework for Video-Based Traffic Sign Detection, Tracking, and Recognition , 2017, IEEE Transactions on Intelligent Transportation Systems.

[27]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[28]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[29]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Mao Ye,et al.  Accurate object detection using memory-based models in surveillance scenes , 2017, Pattern Recognit..

[31]  Pei Xu,et al.  Domain adaption of vehicle detector based on convolutional neural networks , 2015, International Journal of Control, Automation and Systems.

[32]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Luc Van Gool,et al.  Efficient Non-Maximum Suppression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[34]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[40]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Mao Ye,et al.  Memory-based pedestrian detection through sequence learning , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[42]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[43]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[44]  Meng Wang,et al.  Deep Learning of Scene-Specific Classifier for Pedestrian Detection , 2014, ECCV.

[45]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[46]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[47]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[48]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.