Pedestrian detection aided by deep learning semantic tasks

Deep learning methods have achieved great successes in pedestrian detection, owing to its ability to learn discriminative features from raw pixels. However, they treat pedestrian detection as a single binary classification task, which may confuse positive with hard negative samples (Fig.1 (a)). To address this ambiguity, this work jointly optimize pedestrian detection with semantic tasks, including pedestrian attributes (e.g. `carrying backpack') and scene attributes (e.g. `vehicle', `tree', and `horizontal'). Rather than expensively annotating scene attributes, we transfer attributes information from existing scene segmentation datasets to the pedestrian dataset, by proposing a novel deep model to learn high-level features from multiple tasks and multiple data sources. Since distinct tasks have distinct convergence rates and data from different datasets have different distributions, a multi-task deep model is carefully designed to coordinate tasks and reduce discrepancies among datasets. Extensive evaluations show that the proposed approach outperforms the state-of-the-art on the challenging Caltech [9] and ETH [10] datasets where it reduces the miss rates of previous deep models by 17 and 5.5 percent, respectively.

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2]  Xiaogang Wang,et al.  A discriminative deep model for pedestrian detection with occlusion handling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[4]  Bernt Schiele,et al.  Taking a deeper look at pedestrians , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Bernt Schiele,et al.  Filtered channel features for pedestrian detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[7]  Luc Van Gool,et al.  Seeking the Strongest Rigid Detector , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Anton van den Hengel,et al.  Training Effective Node Classifiers for Cascade Classification , 2013, International Journal of Computer Vision.

[9]  Xiaoou Tang,et al.  Pedestrian Attribute Recognition At Far Distance , 2014, ACM Multimedia.

[10]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[12]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Luc Van Gool,et al.  Depth and Appearance for Mobile Scene Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Armin B. Cremers,et al.  Informed Haar-Like Features Improve Pedestrian Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Bernt Schiele,et al.  Ten Years of Pedestrian Detection, What Have We Learned? , 2014, ECCV Workshops.

[17]  Shengcai Liao,et al.  Robust Multi-resolution Pedestrian Detection in Traffic Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Anton van den Hengel,et al.  Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features , 2014, ECCV.

[19]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[20]  Anton van den Hengel,et al.  Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Xiaogang Wang,et al.  Pedestrian Parsing via Deep Decompositional Network , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Jing Xiao,et al.  Detection Evolution with Multi-order Contextual Co-occurrence , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[27]  Xiaogang Wang,et al.  Modeling Mutual Visibility Relationship in Pedestrian Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Xiaogang Wang,et al.  Multi-stage Contextual Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[29]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[30]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[31]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[32]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[34]  Long Zhu,et al.  Learning a Hierarchical Deformable Template for Rapid Deformable Object Parsing , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Xiaogang Wang,et al.  Switchable Deep Network for Pedestrian Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[37]  Joon Hee Han,et al.  Local Decorrelation For Improved Pedestrian Detection , 2014, NIPS.

[38]  Deva Ramanan,et al.  Exploring Weak Stabilization for Motion Feature Extraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40]  Luc Van Gool,et al.  Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Larry S. Davis,et al.  Shape-Based Human Detection and Segmentation via Hierarchical Part-Template Matching , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.