论文信息 - Learning Pairwise Relationship for Multi-object Detection in Crowded Scenes

Learning Pairwise Relationship for Multi-object Detection in Crowded Scenes

As the post-processing step for object detection, non-maximum suppression (GreedyNMS) is widely used in most of the detectors for many years. It is efficient and accurate for sparse scenes, but suffers an inevitable trade-off between precision and recall in crowded scenes. To overcome this drawback, we propose a Pairwise-NMS to cure GreedyNMS. Specifically, a pairwise-relationship network that is based on deep learning is learned to predict if two overlapping proposal boxes contain two objects or zero/one object, which can handle multiple overlapping objects effectively. Through neatly coupling with GreedyNMS without losing efficiency, consistent improvements have been achieved in heavily occluded datasets including MOT15, TUD-Crossing and PETS. In addition, Pairwise-NMS can be integrated into any learning based detectors (Both of Faster-RCNN and DPM detectors are tested in this paper), thus building a bridge between GreedyNMS and end-to-end learning detectors.

[1] Larry S. Davis,et al. Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2] Philip H. S. Torr,et al. Pixelwise Instance Segmentation with a Dynamically Instantiated Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Ian D. Reid,et al. DeepSetNet: Predicting Sets with Deep Neural Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[6] Min Bai,et al. Deep Watershed Transform for Instance Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Rodrigo Benenson,et al. Learning non-maximum suppression Supplementary material , 2017 .

[8] Tat-Jun Chin,et al. Efficient Point Process Inference for Large-Scale Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Bernt Schiele,et al. A Convnet for Non-maximum Suppression , 2015, GCPR.

[12] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[14] Stefan Roth,et al. MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[15] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Matthieu Guillaumin,et al. Non-maximum Suppression for Object Detection by Passing Messages Between Windows , 2014, ACCV.

[20] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[21] Qiang Chen,et al. Network In Network , 2013, ICLR.

[22] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[24] Bernt Schiele,et al. Learning People Detectors for Tracking in Crowded Scenes , 2013, 2013 IEEE International Conference on Computer Vision.

[25] Xiaogang Wang,et al. Single-Pedestrian Detection Aided by Multi-pedestrian Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27] Afshin Dehghan,et al. GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[28] Pushmeet Kohli,et al. On Detection of Multiple Object Instances Using Hough Transforms , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Ivan Laptev,et al. Density-aware person detection and tracking in crowds , 2011, ICCV.

[30] Ali Farhadi,et al. Recognition using visual phrases , 2011, CVPR 2011.

[31] Charless C. Fowlkes,et al. Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[32] A. Ellis,et al. PETS2010 and PETS2009 Evaluation of Results Using Individual Ground Truthed Single Views , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[33] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Stefan Roth,et al. People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Nuno Vasconcelos,et al. Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36] Luc Van Gool,et al. Coupled Detection and Trajectory Estimation for Multi-Object Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[37] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.