Transferring a generic pedestrian detector towards specific scenes

The performance of a generic pedestrian detector may drop significantly when it is applied to a specific scene due to mismatch between the source dataset used to train the detector and samples in the target scene. In this paper, we investigate how to automatically train a scene-specific pedestrian detector starting with a generic detector in video surveillance without further manually labeling any samples under a novel transfer learning framework. It tackles the problem from three aspects. (1) With a graphical representation and through exploring the indegrees from target samples to source samples, the source samples are properly re-weighted. The indegrees detect the boundary between the distributions of the source dataset and the target dataset. The re-weighted source dataset better matches the target scene. (2) It takes the context information from motions, scene structures and scene geometry as the confidence scores of samples from the target scene to guide transfer learning. (3) The confidence scores propagate among samples on a graph according to the underlying visual structures of samples. All these considerations are formulated under a single objective function called Confidence-Encoded SVM. At the test stage, only the appearance-based detector is used without the context cues. The effectiveness of the proposed framework is demonstrated through experiments on two video surveillance datasets. Compared with a generic pedestrian detector, it significantly improves the detection rate by 48% and 36% at one false positive per image on the two datasets respectively.

[1]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[2]  Wei Liu,et al.  Noise resistant graph ranking for improved web image search , 2011, CVPR 2011.

[3]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[4]  Rohini K. Srihari,et al.  Incorporating prior knowledge with weighted margin support vector machines , 2004, KDD.

[5]  Horst Bischof,et al.  Classifier grids for robust adaptive object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[7]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[8]  Charu C. Aggarwal,et al.  Towards cross-category knowledge propagation for learning visual concepts , 2011, CVPR 2011.

[9]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[10]  Xiaogang Wang,et al.  A discriminative deep model for pedestrian detection with occlusion handling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Thomas Hofmann,et al.  Semi-supervised Learning on Directed Graphs , 2004, NIPS.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Shih-Fu Chang,et al.  Label diagnosis through self tuning for web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[15]  Paul A. Viola,et al.  Unsupervised improvement of visual detectors using cotraining , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Shih-Fu Chang,et al.  Cross-domain learning methods for high-level visual concept classification , 2008, 2008 15th IEEE International Conference on Image Processing.

[17]  Vinod Nair,et al.  An unsupervised, online learning framework for moving object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[18]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[19]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Qingming Huang,et al.  Transferring Boosted Detectors Towards Viewpoint and Scene Adaptiveness , 2011, IEEE Transactions on Image Processing.

[21]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[24]  Ramakant Nevatia,et al.  Improving Part based Object Detection by Unsupervised, Online Boosting , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception by Hierarchical Bayesian Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  François Fleuret,et al.  FlowBoost — Appearance learning from sparsely annotated video , 2011, CVPR 2011.

[28]  Meng Wang,et al.  Automatic adaptation of a generic pedestrian detector to a specific traffic scene , 2011, CVPR 2011.