论文信息 - Machine Learning Based Bounding Box Regression for Improved Pedestrian Detection

Machine Learning Based Bounding Box Regression for Improved Pedestrian Detection

The most of the studies on pedestrian and passenger detection focus on end-to-end learning by considering either improvement of features to be used or the enhancement of the detectors. One of the important steps of these systems is non-maximum suppression (NMS), which aims reducing proposed bounding boxes that supposed to belong the same target through a greedy regional search and clustering. In order to improve the performance of NMS, recent approaches consider using only bounding boxes and their scores. By following this path with a novel approach, in this study, a machine learning based bounding box regression approach is proposed. During the training phase, proposed system uses position, size and confidence scores of bounding boxes as features and the same information of the corresponding ground truth (except score) as the desired output. By this way, a pattern between initially generated bounding boxes and the ground truth is revealed. Several tests and experiments have been performed and the results show that the developed system can be particularly effective when correct decisions are needed with low overlapping ratios (such as applications with strong occlusion) without increasing false positives.

[1] Bernt Schiele,et al. Sliding-Windows for Rapid Object Class Localization: A Parallel Technique , 2008, DAGM-Symposium.

[2] Matthieu Guillaumin,et al. Non-maximum Suppression for Object Detection by Passing Messages Between Windows , 2014, ACCV.

[3] Xiaogang Wang,et al. Single-Pedestrian Detection Aided by Multi-pedestrian Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Pushmeet Kohli,et al. On Detection of Multiple Object Instances Using Hough Transforms , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Horst Bischof,et al. Detecting Partially Occluded Objects with an Implicit Shape Model Random Field , 2012, ACCV.

[6] Bernt Schiele,et al. Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[7] Subhransu Maji,et al. Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[8] Bernt Schiele,et al. Taking a deeper look at pedestrians , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Vittorio Ferrari,et al. End-to-End Training of Object Class Detectors for Mean Average Precision , 2016, ACCV.

[10] Bernt Schiele,et al. Subgraph decomposition for multi-target tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Joon Hee Han,et al. Local Decorrelation For Improved Pedestrian Detection , 2014, NIPS.

[12] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[13] Pietro Perona,et al. Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Bernt Schiele,et al. Learning Non-maximum Suppression , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Peter Kontschieder,et al. Evolutionary Hough Games for coherent object detection , 2012, Comput. Vis. Image Underst..

[16] George K. Thiruvathukal,et al. Comparison of Visual Datasets for Machine Learning , 2017, 2017 IEEE International Conference on Information Reuse and Integration (IRI).

[17] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[18] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20] Pietro Perona,et al. Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Bernt Schiele,et al. Ten Years of Pedestrian Detection, What Have We Learned? , 2014, ECCV Workshops.

[22] Andrew Y. Ng,et al. End-to-End People Detection in Crowded Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Xiangyu Zhu,et al. Object detection by labeling superpixels , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Li Wan,et al. End-to-end integration of a Convolutional Network, Deformable Parts Model and non-maximum suppression , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[27] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Trevor Darrell,et al. Spatial Semantic Regularisation for Large Scale Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Bernt Schiele,et al. Detection and Tracking of Occluded People , 2014, International Journal of Computer Vision.

[30] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[31] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Ali Farhadi,et al. Recognition using visual phrases , 2011, CVPR 2011.

[33] Jian Sun,et al. Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Sanja Fidler,et al. Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.