Fusion Based Holistic Road Scene Understanding

This paper addresses the problem of holistic road scene understanding based on the integration of visual and range data. To achieve the grand goal, we propose an approach that jointly tackles object-level image segmentation and semantic region labeling within a conditional random field (CRF) framework. Specifically, we first generate semantic object hypotheses by clustering 3D points, learning their prior appearance models, and using a deep learning method for reasoning their semantic categories. The learned priors, together with spatial and geometric contexts, are incorporated in CRF. With this formulation, visual and range data are fused thoroughly, and moreover, the coupled segmentation and semantic labeling problem can be inferred via Graph Cuts. Our approach is validated on the challenging KITTI dataset that contains diverse complicated road scenarios. Both quantitative and qualitative evaluations demonstrate its effectiveness.

[1]  Junsong Yuan,et al.  Fusion of Velodyne and camera data for scene parsing , 2012, 2012 15th International Conference on Information Fusion.

[2]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[3]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Joost van de Weijer,et al.  Harmony potentials for joint classification and segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6]  Mark E. Campbell,et al.  Segmentation of dense range information in complex urban scenes , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[8]  Ashutosh Saxena,et al.  Cascaded Classification Models: Combining Models for Holistic Scene Understanding , 2008, NIPS.

[9]  Yann LeCun,et al.  Road Scene Segmentation from a Single Image , 2012, ECCV.

[10]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Andreas Zell,et al.  3D LIDAR- and Camera-Based Terrain Classification Under Different Lighting Conditions , 2012, AMS.

[12]  Xiaojin Gong,et al.  Guided Depth Enhancement via Anisotropic Diffusion , 2013, PCM.

[13]  Pushmeet Kohli,et al.  Object stereo — Joint stereo matching and object segmentation , 2011, CVPR 2011.

[14]  Sebastian Thrun,et al.  An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[15]  Hakil Kim,et al.  Novel and efficient pedestrian detection using bidirectional PCA , 2013, Pattern Recognit..

[16]  Theo Gevers,et al.  3D Scene priors for road detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Mario Fritz,et al.  Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Runsheng Wang,et al.  Semantic modeling of natural scenes based on contextual Bayesian networks , 2010, Pattern Recognit..

[19]  Tsuhan Chen,et al.  Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models , 2010, NIPS.

[20]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[21]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Luc Van Gool,et al.  Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Sanja Fidler,et al.  Holistic Scene Understanding for 3D Object Detection with RGBD Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2012, International Journal of Computer Vision.

[25]  Che-Hao Chang,et al.  Low resolution pedestrian detection using light robust features and hierarchical system , 2014, Pattern Recognit..

[26]  Noah Snavely,et al.  NYC3DCars: A Dataset of 3D Vehicles in Geographic Context , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[28]  Xiaojin Gong,et al.  Integrating visual and range data for road detection , 2013, 2013 IEEE International Conference on Image Processing.

[29]  Edwin Olson,et al.  Graph-based segmentation for colored 3D laser point clouds , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[31]  Bertrand Douillard,et al.  On the segmentation of 3D LIDAR point clouds , 2011, 2011 IEEE International Conference on Robotics and Automation.

[32]  Ali Shahrokni,et al.  Mesh Based Semantic Modelling for Indoor and Outdoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Marc Pollefeys,et al.  Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Svetlana Lazebnik,et al.  Understanding scenes on many levels , 2011, 2011 International Conference on Computer Vision.

[35]  Xiaojin Gong,et al.  Road scene segmentation via fusing camera and lidar data , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[36]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[37]  Dietrich Paulus,et al.  Terrain Classification with Markov Random Fields on fused Camera and 3D Laser Range Data , 2011, ECMR.

[38]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[39]  Chanho Jung,et al.  Real-time estimation of 3D scene geometry from a single image , 2012, Pattern Recognit..

[40]  Seiichi Mita,et al.  Hierarchical road understanding for intelligent vehicles based on sensor fusion , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[41]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[42]  Changshui Zhang,et al.  Front-view vehicle detection by Markov chain Monte Carlo method , 2009, Pattern Recognition.