Pedestrian Proximity Detection using RGB-D Data

This paper presents a novel method for pedestrian detection and distance estimation using RGB-D data. We use Mask R-CNN for instance-level pedestrian segmentation, and the Semiglobal Matching algorithm for computing depth information from a pair of infrared images captured by an Intel RealSense D435 stereo vision depth camera. The resulting depth map is post-processed using both spatial and temporal edge-preserving filters and spatial hole-filling to mitigate erroneous or missing depth values. The distance to each pedestrian is estimated using the median depth value of the pixels in the depth map covered by the predicted mask. Unlike previous work, our method is evaluated on, and performs well across, a wide spectrum of outdoor lighting conditions. Our proposed technique is able to detect and estimate the distance of pedestrians within 5m with an average accuracy of 87.7%.

[1]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Guy Shani,et al.  Comparing RGB-D Sensors for Close Range Outdoor Agricultural Phenotyping , 2018, Sensors.

[3]  Jake K. Aggarwal,et al.  Human detection using depth information by Kinect , 2011, CVPR 2011 WORKSHOPS.

[4]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Pietro Perona,et al.  Pedestrian detection: A benchmark , 2009, CVPR.

[6]  Bernt Schiele,et al.  CityPersons: A Diverse Dataset for Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Kai Oliver Arras,et al.  People detection in RGB-D data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Richard Green,et al.  Pedestrian avoidance in construction sites , 2017, 2017 International Conference on Image and Vision Computing New Zealand (IVCNZ).

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[12]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[13]  Bernt Schiele,et al.  Towards Reaching Human Performance in Pedestrian Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Matteo Munaro,et al.  Performance evaluation of the 1st and 2nd generation Kinect for multimedia applications , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).