Pedestrian Detection via Bi-directional Multi-scale Analysis

Scale analysis plays a vital role in pedestrian detection. Conventional approaches usually directly concatenate multi-scale outputs, which is only capable of modeling first-order dependency among various scales. In contrast, this work proposes a novel scale-context modeling scheme by exploiting the highly nonlinear dependency among scales. The proposed scheme aggregates output response maps from mid-results of convolutional layers via a bi-directional recurrent sub-network. Therefore scale information could flow among different layers and implicit underlying dependency structure information in the scale space would be disclosed, which yields more consistency detection. Experimental results on Caltech Pedestrian detection benchmark demonstrate the superior detection (state-of-the-art miss rate of 8.56%) of the proposed method over prior art.

[1]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[2]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jungwon Lee,et al.  Fused DNN: A Deep Neural Network Fusion Approach to Fast and Robust Pedestrian Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[4]  Xiaogang Wang,et al.  Switchable Deep Network for Pedestrian Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[6]  Joon Hee Han,et al.  Local Decorrelation For Improved Pedestrian Detection , 2014, NIPS.

[7]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Bernt Schiele,et al.  Taking a deeper look at pedestrians , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[10]  Luc Van Gool,et al.  Handling Occlusions with Franken-Classifiers , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Anton van den Hengel,et al.  Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features , 2014, ECCV.

[13]  Jiaxing Zhang,et al.  Scale-Invariant Convolutional Neural Networks , 2014, ArXiv.

[14]  Xiaogang Wang,et al.  Gated Bi-directional CNN for Object Detection , 2016, ECCV.

[15]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[16]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[18]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[21]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[22]  Xiaogang Wang,et al.  Pedestrian detection aided by deep learning semantic tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Xiaogang Wang,et al.  Deep Learning Strong Parts for Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Xiaogang Wang,et al.  A discriminative deep model for pedestrian detection with occlusion handling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Bernt Schiele,et al.  Filtered channel features for pedestrian detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[29]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[31]  Xiaogang Wang,et al.  Modeling Mutual Visibility Relationship in Pedestrian Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  M. Gribaudo,et al.  2002 , 2001, Cell and Tissue Research.

[33]  Fuchun Sun,et al.  HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  David W. Jacobs,et al.  Locally Scale-Invariant Convolutional Neural Networks , 2014, ArXiv.

[36]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[37]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[38]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[39]  Yi Yang,et al.  Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).