Pedestrian proposal generation using depth-aware scale estimation

In this work, we propose an efficient method that generates pedestrian proposals suitable for the autonomous vehicle. Our main intuition is that depth information provides an important cue to assign the scale of pedestrian proposals. Based on the observation that in a 3-D world coordinate the scales of pedestrians are almost similar, we formulate the scales of pedestrian patches by projecting 3-D models to an image plane with its corresponding depth. We also introduce a scale-aware binary description using both color and depth images. By using this descriptor, the regression models are trained to rank the pedestrian proposal candidates and adjust the proposal bounding boxes for an accurate localization. Our algorithm achieves significant performance gains compared to conventional proposal generation methods on the challenging KITTI dataset.

[1]  Meng Wang,et al.  Scene-Specific Pedestrian Detection for Static Video Surveillance , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Kihong Park,et al.  Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[5]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[6]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jiaolong Xu,et al.  Multiview random forest of local experts combining RGB and LIDAR data for pedestrian detection , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[9]  Kwanghoon Sohn,et al.  Non-parametric human segmentation using support vector machine , 2016, 2016 IEEE International Conference on Consumer Electronics (ICCE).

[10]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[11]  Adam Krzyzak,et al.  A new approach for binary feature selection and combining classifiers , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).

[12]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Minh N. Do,et al.  Locating 3D Object Proposals: A Depth-Based Online Approach , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[17]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[18]  Cristiano Premebida,et al.  Pedestrian detection combining RGB and dense LIDAR data , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Raquel Urtasun,et al.  Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation , 2014, ECCV.

[20]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2019, Computational Visual Media.

[21]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[22]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[23]  Bernt Schiele,et al.  Taking a deeper look at pedestrians , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).