Pedestrian detection based on sparse coding and transfer learning

Pedestrian detection is a fundamental problem in video surveillance and has achieved great progress in recent years. However, training a generic detector performing well in a great variety of scenes has proved to be very difficult. On the other hand, exhausting manual labeling efforts for each specific scene to achieve high accuracy of detection is not acceptable especially for video surveillance applications. To alleviate the manual labeling efforts without scarifying accuracy of detection, we propose a transfer learning framework based on sparse coding for pedestrian detection. In our method, generic detector is used to get the initial target samples, and then several filters are used to select a small part of samples (called as target templates) from the initial target samples which we are very sure about their labels and confidence values. The relevancy between source samples and target templates and the relevancy between target samples and target templates are estimated by sparse coding and later used to calculate the weights for source samples and target samples. By adding the sparse coding-based weights to all these samples during re-training process, we can not only exclude outliers in the source samples, but also tackle the drift problem in the target samples, and thus get a well scene-specific pedestrian detector. Our experiments on two public datasets show that our trained scene-specific pedestrian detector performs well and is comparable with the detector trained on a large number of training samples manually labeled from the target scene.

[1]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2]  Horst Bischof,et al.  Classifier grids for robust adaptive object detection , 2009, CVPR.

[3]  Marc Van Droogenbroeck,et al.  ViBe: A Universal Background Subtraction Algorithm for Video Sequences , 2011, IEEE Transactions on Image Processing.

[4]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[5]  Ramakant Nevatia,et al.  Improving Part based Object Detection by Unsupervised, Online Boosting , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Haibin Ling,et al.  Robust Visual Tracking and Vehicle Classification via Sparse Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Meng Wang,et al.  Automatic adaptation of a generic pedestrian detector to a specific traffic scene , 2011, CVPR 2011.

[9]  Paul A. Viola,et al.  Unsupervised improvement of visual detectors using cotraining , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  YangQiang,et al.  A Survey on Transfer Learning , 2010 .

[11]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[12]  Ramakant Nevatia,et al.  Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Meng Wang,et al.  Transferring a generic pedestrian detector towards specific scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Qingming Huang,et al.  Transferring Boosted Detectors Towards Viewpoint and Scene Adaptiveness , 2011, IEEE Transactions on Image Processing.

[19]  Ulf Assarsson,et al.  A Benchmark for , 2001 .

[20]  Tomaso A. Poggio,et al.  Pedestrian detection using wavelet templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Shuicheng Yan,et al.  Learning With $\ell ^{1}$-Graph for Image Analysis , 2010, IEEE Transactions on Image Processing.

[22]  Pietro Perona,et al.  Pedestrian detection: A benchmark , 2009, CVPR.

[23]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[24]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[25]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[26]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[27]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[28]  Sheng Tang,et al.  Fast Pedestrian Detection Based on Sliding Window Filtering , 2012, PCM.

[29]  Vinod Nair,et al.  An unsupervised, online learning framework for moving object detection , 2004, CVPR 2004.

[30]  Thomas G. Dietterich,et al.  Improving SVM accuracy by training on auxiliary data sources , 2004, ICML.

[31]  Charless C. Fowlkes,et al.  Do We Need More Training Data or Better Models for Object Detection? , 2012, BMVC.

[32]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[33]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[34]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[37]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[38]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[39]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[40]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Dariu Gavrila,et al.  An Experimental Study on Pedestrian Classification , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.