A Multi-modal Graphical Model for Scene Analysis

In this paper, we introduce a multi-modal graphical model to address the problems of semantic segmentation using 2D-3D data exhibiting extensive many-to-one correspondences. Existing methods often impose a hard correspondence between the 2D and 3D data, where the 2D and 3D corresponding regions are forced to receive identical labels. This results in performance degradation due to misalignments, 3D-2D projection errors and occlusions. We address this issue by defining a graph over the entire set of data that models soft correspondences between the two modalities. This graph encourages each region in a modality to leverage the information from its corresponding regions in the other modality to better estimate its class label. We evaluate our method on a publicly available dataset and beat the state-of-the-art. Additionally, to demonstrate the ability of our model to support multiple correspondences for objects in 3D and 2D domains, we introduce a new multi-modal dataset, which is composed of panoramic images and LIDAR data, and features a rich set of many-to-one correspondences.

[1]  Ruigang Yang,et al.  Semantic Segmentation of Urban Scenes Using Dense Depth Maps , 2010, ECCV.

[2]  Bastian Leibe,et al.  Joint 2D-3D temporally consistent semantic segmentation of street scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Toby P. Breckon,et al.  Automatic Road Environment Classification , 2011, IEEE Transactions on Intelligent Transportation Systems.

[4]  Martial Hebert,et al.  Co-inference for Multi-modal Scene Analysis , 2012, ECCV.

[5]  Alex Brooks,et al.  A 3D laser and vision based classifier , 2009, 2009 International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP).

[6]  Sanja Fidler,et al.  Holistic Scene Understanding for 3D Object Detection with RGBD Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[8]  Pushmeet Kohli,et al.  Inference Methods for CRFs with Co-occurrence Statistics , 2012, International Journal of Computer Vision.

[9]  O. Barinova,et al.  NON-ASSOCIATIVE MARKOV NETWORKS FOR 3D POINT CLOUD CLASSIFICATION , 2010 .

[10]  Martial Hebert,et al.  3-D scene analysis via sequenced predictions over points and regions , 2011, 2011 IEEE International Conference on Robotics and Automation.

[11]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[12]  Ali Shahrokni,et al.  Urban 3D semantic modelling using stereo vision , 2013, 2013 IEEE International Conference on Robotics and Automation.

[13]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[14]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Jianxiong Xiao,et al.  Multiple view semantic segmentation for street view images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Lars Petersson,et al.  Classification of natural scene multi spectral images using a new enhanced CRF , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Dieter Fox,et al.  Object recognition with hierarchical kernel descriptors , 2011, CVPR 2011.

[18]  Sebastian Thrun,et al.  Towards 3D object recognition via classification of arbitrary object tracks , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[20]  Jana Kosecka,et al.  Semantic segmentation with heterogeneous sensor coverages , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[22]  Dieter Fox,et al.  Detection-based object labeling in 3D scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[23]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2012, International Journal of Computer Vision.

[24]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[25]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Thorsten Joachims,et al.  Contextually Guided Semantic Labeling and Search for 3D Point Clouds , 2011, ArXiv.

[27]  Paul Newman,et al.  Fast Probabilistic Labeling of City Maps , 2008, Robotics: Science and Systems.

[28]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.