Pedestrian detection combining RGB and dense LIDAR data

Why is pedestrian detection still very challenging in realistic scenes? How much would a successful solution to monocular depth inference aid pedestrian detection? In order to answer these questions we trained a state-of-the-art deformable parts detector using different configurations of optical images and their associated 3D point clouds, in conjunction and independently, leveraging upon the recently released KITTI dataset. We propose novel strategies for depth upsampling and contextual fusion that together lead to detection performance which exceeds that of the RGB-only systems. Our results suggest depth cues as a very promising mid-level target for future pedestrian detection approaches.

[1]  Roland Siegwart,et al.  Human detection using multimodal and multidimensional features , 2008, 2008 IEEE International Conference on Robotics and Automation.

[2]  William T. Freeman,et al.  Orientation Histograms for Hand Gesture Recognition , 1995 .

[3]  Philip H. S. Torr,et al.  Struck: Structured output tracking with kernels , 2011, ICCV.

[4]  Armin B. Cremers,et al.  Laser-based segment classification using a mixture of bag-of-words , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Sebastian Thrun,et al.  Upsampling range data in dynamic environments , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[7]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[8]  Pascal Vasseur,et al.  Introduction to Multisensor Data Fusion , 2005, The Industrial Information Technology Handbook.

[9]  Martial Hebert,et al.  3-D scene analysis via sequenced predictions over points and regions , 2011, 2011 IEEE International Conference on Robotics and Automation.

[10]  Li-Chen Fu,et al.  Comparison of granules features for pedestrian detection , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[11]  Rita Cucchiara,et al.  Detecting objects, shadows and ghosts in video streams by exploiting color and motion information , 2001, Proceedings 11th International Conference on Image Analysis and Processing.

[12]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Rudolph Triebel,et al.  Non-Iterative Vision-Based Interpolation of 3D Laser Scans , 2007 .

[14]  Jean-Marc Odobez,et al.  Multi-Layer Background Subtraction Based on Color and Texture , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Haibin Ling,et al.  Diffusion Distance for Histogram Comparison , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Luigi di Stefano,et al.  People Tracking Using a Time-of-Flight Depth Sensor , 2006, 2006 IEEE International Conference on Video and Signal Based Surveillance.

[17]  Kai Oliver Arras,et al.  People detection in RGB-D data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Carlo Tomasi,et al.  People Detection Using Color and Depth Images , 2011, MCPR.

[19]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[20]  Junjie Yan,et al.  Water Filling: Unsupervised People Counting via Vertical Kinect Sensor , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[21]  Cristiano Premebida,et al.  Fusing LIDAR, camera and semantic information: A context-based approach for pedestrian detection , 2013, Int. J. Robotics Res..

[22]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Nassir Navab,et al.  Human skeleton tracking from depth data using geodesic distances and optical flow , 2012, Image Vis. Comput..

[24]  Dieter Fox,et al.  Sparse distance learning for object recognition combining RGB and depth information , 2011, 2011 IEEE International Conference on Robotics and Automation.

[25]  Huchuan Lu,et al.  Visual tracking via adaptive structural local sparse appearance model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[27]  Ming-Hsuan Yang,et al.  Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Martial Hebert,et al.  Co-inference for Multi-modal Scene Analysis , 2012, ECCV.

[29]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[30]  Christian Micheloni,et al.  Video security for ambient intelligence , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[31]  Sebastian Thrun,et al.  Precision tracking with sparse 3D and dense color 2D data , 2013, 2013 IEEE International Conference on Robotics and Automation.

[32]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Manuela M. Veloso,et al.  Fast human detection for indoor mobile robots using depth images , 2013, 2013 IEEE International Conference on Robotics and Automation.

[36]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  David Gerónimo Gómez,et al.  Survey of Pedestrian Detection for Advanced Driver Assistance Systems , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[39]  Hélène Laurent,et al.  Review and evaluation of commonly-implemented background subtraction algorithms , 2008, 2008 19th International Conference on Pattern Recognition.

[40]  D. Fox,et al.  Classification and Semantic Mapping of Urban Environments , 2011, Int. J. Robotics Res..

[41]  Jun Miura,et al.  Pedestrian Recognition Using High-definition LIDAR , 2011 .

[42]  Hiroshi Ishiguro,et al.  Laser tracking of human body motion using adaptive shape modeling , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  Sebastian Thrun,et al.  Towards 3D object recognition via classification of arbitrary object tracks , 2011, 2011 IEEE International Conference on Robotics and Automation.

[44]  Thorsten Joachims,et al.  Contextually guided semantic labeling and search for three-dimensional point clouds , 2013, Int. J. Robotics Res..

[45]  Derek D. Lichti,et al.  Temporal Stability of the Velodyne HDL-64E S2 Scanner for High Accuracy Scanning Applications , 2011, Remote. Sens..

[46]  Huchuan Lu,et al.  Superpixel tracking , 2011, 2011 International Conference on Computer Vision.

[47]  Monica N. Nicolescu,et al.  Understanding human intentions via Hidden Markov Models in autonomous mobile robots , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[48]  Shuichi Nishio,et al.  Scalable and robust multi-people head tracking by combining distributed multiple sensors , 2010, Intell. Serv. Robotics.