Exploiting Target Data to Learn Deep Convolutional Networks for Scene-Adapted Human Detection

The difference between sample distributions of public data sets and specific scenes can be very significant. As a result, the deployment of generic human detectors in real-world scenes most often leads to sub-optimal detection performance. To avoid the labor-intensive task of manual annotations, we propose a semi-supervised approach for training deep convolutional networks on partially labeled data. To exploit a large amount of unlabeled target data, the knowledge learnt from public data sets is transferred to new model training by adapting an auxiliary detector to the target scene. We hypothesize that the components of the auxiliary detector capture essential human characteristics useful for constructing a scene-adapted detector. A selective ensemble algorithm is proposed to select a subset of the components relevant to the target scene for recombination. The resulting model is applied for collecting high-confidence samples from unlabeled target data. Furthermore, a deep convolutional network is trained by progressively labeling and selecting new training samples in a self-paced way. The detailed experimental evaluation verifies the effectiveness and superiority of the proposed approach in scene-specific human detection.

[1]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[2]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Luc Van Gool,et al.  Depth and Appearance for Mobile Scene Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Martial Hebert,et al.  Watch and learn: Semi-supervised learning of object detectors from videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Thierry Chateau,et al.  Sequential Monte Carlo filter based on multiple strategies for a scene specialization classifier , 2016, EURASIP Journal on Image and Video Processing.

[8]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[9]  David Vázquez,et al.  Occlusion Handling via Random Subspace Classifiers for Human Detection , 2014, IEEE Transactions on Cybernetics.

[10]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[11]  Shiguang Shan,et al.  Bi-Shifting Auto-Encoder for Unsupervised Domain Adaptation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Fei He,et al.  Cognitive pedestrian detector: Adapting detector to specific scene by transferring attributes , 2015, Neurocomputing.

[14]  Luc Van Gool,et al.  Cascaded Confidence Filtering for Improved Tracking-by-Detection , 2010, ECCV.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Shiguang Shan,et al.  Self-Paced Learning with Diversity , 2014, NIPS.

[17]  A. Leonardis,et al.  On-line Conservative Learning for Person Detection , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[18]  Xiaogang Wang,et al.  A discriminative deep model for pedestrian detection with occlusion handling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Bernt Schiele,et al.  Filtered channel features for pedestrian detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Che-Hao Chang,et al.  Low resolution pedestrian detection using light robust features and hierarchical system , 2014, Pattern Recognit..

[21]  Qingming Huang,et al.  Transferring Boosted Detectors Towards Viewpoint and Scene Adaptiveness , 2011, IEEE Transactions on Image Processing.

[22]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[23]  Hau-San Wong,et al.  Variant SemiBoost for Improving Human Detection in Application Scenes , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[25]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ramakant Nevatia,et al.  Improving Part based Object Detection by Unsupervised, Online Boosting , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Xindong Wu,et al.  NESVM: A Fast Gradient Method for Support Vector Machines , 2010, 2010 IEEE International Conference on Data Mining.

[29]  Si Wu,et al.  Improving pedestrian detection with selective gradient self-similarity feature , 2015, Pattern Recognit..

[30]  Meng Wang,et al.  Transferring a generic pedestrian detector towards specific scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Gang Hua,et al.  Detection by detections: Non-parametric detector adaptation for a video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Meng Wang,et al.  Automatic adaptation of a generic pedestrian detector to a specific traffic scene , 2011, CVPR 2011.

[33]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[34]  Wanqing Li,et al.  Human detection from images and videos: A survey , 2016, Pattern Recognit..

[35]  Vinod Nair,et al.  An unsupervised, online learning framework for moving object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[36]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[38]  Mubarak Shah,et al.  Online detection and classification of moving objects using progressively improving detectors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Shih-Fu Chang,et al.  Cross-domain learning methods for high-level visual concept classification , 2008, 2008 15th IEEE International Conference on Image Processing.

[40]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  B. Schiele,et al.  Multi-cue onboard pedestrian detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43]  Bernt Schiele,et al.  New features and insights for pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Yuxin Peng,et al.  A Boosted Multi-Task Model for Pedestrian Detection With Occlusion Handling , 2015, IEEE Transactions on Image Processing.

[45]  Meng Wang,et al.  Scene-Specific Pedestrian Detection for Static Video Surveillance , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Chih-Yen Chen,et al.  Novel outline features for pedestrian detection system with thermal images , 2015, Pattern Recognit..

[47]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Paul A. Viola,et al.  Unsupervised improvement of visual detectors using cotraining , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[49]  Thierry Chateau,et al.  Faster R-CNN Scene Specialization with a Sequential Monte-Carlo Framework , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[50]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Xuelong Li,et al.  Pedestrian Detection Inspired by Appearance Constancy and Shape Symmetry , 2015, IEEE Transactions on Image Processing.

[52]  Xiaogang Wang,et al.  Deep Learning Strong Parts for Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[53]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[54]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[55]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[56]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[57]  B. Schiele,et al.  How Far are We from Solving Pedestrian Detection? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Meng Wang,et al.  Deep Learning of Scene-Specific Classifier for Pedestrian Detection , 2014, ECCV.