论文信息 - Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model

Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model

Detecting pedestrians in cluttered scenes is a challenging problem in computer vision. The difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation—the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. The visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Then the evidence of co-existing pedestrians is used for improving the single pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the Caltech-Train dataset and the ETH dataset. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. The mutual visibility deep model leads to 6–15 % improvements on multiple benchmark datasets.

[1] Song-Chun Zhu,et al. A Numerical Study of the Bottom-Up and Top-Down Inference Processes in And-Or Graphs , 2011, International Journal of Computer Vision.

[2] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Junjie Yan,et al. Multi-pedestrian detection in crowded scenes: A global view , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Bernt Schiele,et al. A Performance Evaluation of Single and Multi-feature People Detection , 2008, DAGM-Symposium.

[5] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[6] Ramakant Nevatia,et al. Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7] Xiaogang Wang,et al. Single-Pedestrian Detection Aided by Multi-pedestrian Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Tsuhan Chen,et al. Extracting adaptive contextual cues from unlabeled regions , 2011, 2011 International Conference on Computer Vision.

[9] Mohammad Norouzi,et al. Stacks of convolutional Restricted Boltzmann Machines for shift-invariant feature learning , 2009, CVPR.

[10] Shihong Lao,et al. A Structural Filter Approach to Human Detection , 2010, ECCV.

[11] Xiaogang Wang,et al. Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[12] Pietro Perona,et al. The Fastest Pedestrian Detector in the West , 2010, BMVC.

[13] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14] Ali Farhadi,et al. Recognition using visual phrases , 2011, CVPR 2011.

[15] Bernt Schiele,et al. New features and insights for pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16] Peter V. Gehler,et al. Occlusion Patterns for Object Class Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Piotr Dollár,et al. Crosstalk Cascades for Frame-Rate Pedestrian Detection , 2012, ECCV.

[18] Xiaogang Wang,et al. Hierarchical face parsing via deep learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Pietro Perona,et al. Integral Channel Features , 2009, BMVC.

[20] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21] Anton van den Hengel,et al. Efficient Pedestrian Detection by Directly Optimizing the Partial Area under the ROC Curve , 2013, 2013 IEEE International Conference on Computer Vision.

[22] Yi Yang,et al. Recognizing proxemics in personal photos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Bernt Schiele,et al. Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[25] Ram Nevatia,et al. Detection and Segmentation of Multiple, Partially Occluded Objects by Grouping, Merging, Assigning Part Detection Responses , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Xiaogang Wang,et al. Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27] David Vázquez,et al. Random Forests of Local Experts for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[28] Aggelos K. Katsaggelos,et al. Detector Ensemble , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Peter Glöckner,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[30] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[31] Shengcai Liao,et al. Robust Multi-resolution Pedestrian Detection in Traffic Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Bernt Schiele,et al. Detection and Tracking of Occluded People , 2014, International Journal of Computer Vision.

[33] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Dan Levi,et al. Part-Based Feature Synthesis for Human Detection , 2010, ECCV.

[35] Xiaogang Wang,et al. Multi-stage Contextual Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[36] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Larry S. Davis,et al. Hierarchical Part-Template Matching for Human Detection and Segmentation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[38] Charless C. Fowlkes,et al. Multiresolution Models for Object Detection , 2010, ECCV.

[39] Xiaogang Wang,et al. Partial Occlusion Handling in Pedestrian Detection With a Deep Model , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[40] Deva Ramanan,et al. Exploring Weak Stabilization for Motion Feature Extraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[42] Dariu Gavrila,et al. Multi-cue pedestrian classification with partial occlusion handling , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43] Fei-Fei Li,et al. Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44] Jing Xiao,et al. Detection Evolution with Multi-order Contextual Co-occurrence , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46] Pietro Perona,et al. Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47] Shuicheng Yan,et al. An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[48] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[49] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50] Bernt Schiele,et al. Learning People Detectors for Tracking in Crowded Scenes , 2013, 2013 IEEE International Conference on Computer Vision.

[51] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52] Luc Van Gool,et al. Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[53] Yann LeCun,et al. Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[54] Jing Xiao,et al. Contextual boost for pedestrian detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[55] Jonathon Shlens,et al. Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[56] Luc Van Gool,et al. Seeking the Strongest Rigid Detector , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[57] Lin Sun,et al. DL-SFA: Deeply-Learned Slow Feature Analysis for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[58] Jiwen Lu,et al. Discriminative Deep Metric Learning for Face Verification in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59] Xiaogang Wang,et al. Single-Pedestrian Detection Aided by Two-Pedestrian Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60] William T. Freeman,et al. Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[61] Ping Liu,et al. Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[62] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63] Xiaogang Wang,et al. Modeling Mutual Visibility Relationship in Pedestrian Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[64] Bohyung Han,et al. Improving object localization using macrofeature layout selection , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[65] Xiaogang Wang,et al. A discriminative deep model for pedestrian detection with occlusion handling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[66] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[67] Luc Van Gool,et al. Handling Occlusions with Franken-Classifiers , 2013, 2013 IEEE International Conference on Computer Vision.

[68] Deva Ramanan,et al. Detecting Actions, Poses, and Objects with Relational Phraselets , 2012, ECCV.

[69] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[70] Anton van den Hengel,et al. Training Effective Node Classifiers for Cascade Classification , 2013, International Journal of Computer Vision.

[71] Luc Van Gool,et al. Depth and Appearance for Mobile Scene Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[72] Geoffrey E. Hinton,et al. On deep generative models with applications to recognition , 2011, CVPR 2011.

[73] Larry S. Davis,et al. Human detection using partial least squares analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[74] Larry S. Davis,et al. Bilattice-based Logical Reasoning for Human Detection , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[75] Pietro Perona,et al. Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76] Paul A. Viola,et al. Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.