Exploiting context based on CNN and coding representations for pedestrian co-detection

The exploitation of contextual information among multiple images has been proven significant to improve detection performance by object co-detection methods. In this paper, we propose a pedestrian co-detection method that combines the strengths of convolutional neural networks (CNNs) and locality-constrained linear coding (LLC) in a unified conditional random field (CRF) model. First, we obtain object candidates by using a region proposal network (RPN) in Faster R-CNN. Second, we build a fully connected CRF that consists of unary potentials on individual object candidates and two types of pairwise potentials on pairs of object candidates. The unary potential is computed independently for each object candidate by using the baseline method. The pairwise potentials consist of multiscale CNN and LLC representation-based potentials, which contribute to the capturing of relationships among object candidates in all the test images. Finally, we jointly predict the category labels of all the object candidates through the mean field inference in the CRF. We evaluated the proposed method on the ETH, Caltech, and INRIA Pedestrian datasets. The experimental results demonstrate the effectiveness of the proposed method as compared to the baseline method.

[1]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Bin Yang,et al.  Convolutional Channel Features , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Silvio Savarese,et al.  Object Co-detection , 2012, ECCV.

[4]  Deva Ramanan,et al.  Histograms of Sparse Codes for Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Dong Liu,et al.  Robust Object Co-detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Shibiao Xu,et al.  Real-time pedestrian detection via hierarchical convolutional feature , 2018, Multimedia Tools and Applications.

[7]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Xiaoming Liu,et al.  Illuminating Pedestrians via Simultaneous Detection and Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Ting Rui,et al.  Pedestrian detection based on multi-convolutional features by feature maps pruning , 2017, Multimedia Tools and Applications.

[10]  Ming-Hsuan Yang,et al.  Top-down visual saliency via joint CRF and dictionary learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Renjie Liao,et al.  CoDeL: A Human Co-detection and Labeling Framework , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[14]  Qiang Chen,et al.  Contextualizing Object Detection and Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Andrew Adams,et al.  Fast High‐Dimensional Filtering Using the Permutohedral Lattice , 2010, Comput. Graph. Forum.

[16]  Mihai Ciuc,et al.  Normalized Autobinomial Markov Channels For Pedestrian Detection , 2015, BMVC.

[17]  Anton van den Hengel,et al.  Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features , 2014, ECCV.

[18]  Osamu Hasegawa,et al.  Random Field Model for Integration of Local Information and Global Information , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Joon Hee Han,et al.  Local Decorrelation For Improved Pedestrian Detection , 2014, NIPS.

[20]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Bernt Schiele,et al.  Filtered channel features for pedestrian detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Anton van den Hengel,et al.  Training Effective Node Classifiers for Cascade Classification , 2013, International Journal of Computer Vision.

[27]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[28]  Vibhav Vineet,et al.  Filter-Based Mean-Field Inference for Random Fields with Higher-Order Terms and Product Label-Spaces , 2012, International Journal of Computer Vision.

[29]  Sanjiv Kumar,et al.  Discriminative Random Fields , 2006, International Journal of Computer Vision.

[30]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Carsten Rother,et al.  Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation: Combining Probabilistic Graphical Models with Deep Learning for Structured Prediction , 2018, IEEE Signal Processing Magazine.

[32]  Pietro Perona,et al.  Quickly Boosting Decision Trees - Pruning Underachieving Features Early , 2013, ICML.

[33]  Xuming He,et al.  Structural Kernel Learning for Large Scale Multiclass Object Co-detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[35]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[36]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[37]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[41]  Philip H. S. Torr,et al.  Higher Order Conditional Random Fields in Deep Neural Networks , 2015, ECCV.

[42]  Trevor Darrell,et al.  LSDA: Large Scale Detection through Adaptation , 2014, NIPS.

[43]  Philip H. S. Torr,et al.  Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation , 2017 .

[44]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[45]  Jie Yang,et al.  Saliency Detection by Fully Learning a Continuous Conditional Random Field , 2017, IEEE Transactions on Multimedia.

[46]  Luc Van Gool,et al.  Seeking the Strongest Rigid Detector , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[48]  Pushmeet Kohli,et al.  On Detection of Multiple Object Instances Using Hough Transforms , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Xuming He,et al.  Object Co-detection via Efficient Inference in a Fully-Connected CRF , 2014, ECCV.

[50]  Fan Yang,et al.  Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Martin Kampel,et al.  Profile-based Pottery Reconstruction , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[52]  David Vázquez,et al.  Random Forests of Local Experts for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[53]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[54]  Nuno Vasconcelos,et al.  Learning Complexity-Aware Cascades for Deep Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).