论文信息 - A Multi-modal Graphical Model for Scene Analysis

A Multi-modal Graphical Model for Scene Analysis

In this paper, we introduce a multi-modal graphical model to address the problems of semantic segmentation using 2D-3D data exhibiting extensive many-to-one correspondences. Existing methods often impose a hard correspondence between the 2D and 3D data, where the 2D and 3D corresponding regions are forced to receive identical labels. This results in performance degradation due to misalignments, 3D-2D projection errors and occlusions. We address this issue by defining a graph over the entire set of data that models soft correspondences between the two modalities. This graph encourages each region in a modality to leverage the information from its corresponding regions in the other modality to better estimate its class label. We evaluate our method on a publicly available dataset and beat the state-of-the-art. Additionally, to demonstrate the ability of our model to support multiple correspondences for objects in 3D and 2D domains, we introduce a new multi-modal dataset, which is composed of panoramic images and LIDAR data, and features a rich set of many-to-one correspondences.

[1] Ruigang Yang,et al. Semantic Segmentation of Urban Scenes Using Dense Depth Maps , 2010, ECCV.

[2] Bastian Leibe,et al. Joint 2D-3D temporally consistent semantic segmentation of street scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Toby P. Breckon,et al. Automatic Road Environment Classification , 2011, IEEE Transactions on Intelligent Transportation Systems.

[4] Martial Hebert,et al. Co-inference for Multi-modal Scene Analysis , 2012, ECCV.

[5] Alex Brooks,et al. A 3D laser and vision based classifier , 2009, 2009 International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP).

[6] Sanja Fidler,et al. Holistic Scene Understanding for 3D Object Detection with RGBD Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[7] Michael I. Jordan,et al. Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[8] Pushmeet Kohli,et al. Inference Methods for CRFs with Co-occurrence Statistics , 2012, International Journal of Computer Vision.

[9] O. Barinova,et al. NON-ASSOCIATIVE MARKOV NETWORKS FOR 3D POINT CLOUD CLASSIFICATION , 2010 .

[10] Martial Hebert,et al. 3-D scene analysis via sequenced predictions over points and regions , 2011, 2011 IEEE International Conference on Robotics and Automation.

[11] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[12] Ali Shahrokni,et al. Urban 3D semantic modelling using stereo vision , 2013, 2013 IEEE International Conference on Robotics and Automation.

[13] Thorsten Joachims,et al. Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[14] Dorin Comaniciu,et al. Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15] Jianxiong Xiao,et al. Multiple view semantic segmentation for street view images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16] Lars Petersson,et al. Classification of natural scene multi spectral images using a new enhanced CRF , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17] Dieter Fox,et al. Object recognition with hierarchical kernel descriptors , 2011, CVPR 2011.

[18] Sebastian Thrun,et al. Towards 3D object recognition via classification of arbitrary object tracks , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[20] Jana Kosecka,et al. Semantic segmentation with heterogeneous sensor coverages , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[21] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[22] Dieter Fox,et al. Detection-based object labeling in 3D scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[23] W. F. Clocksin,et al. Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2012, International Journal of Computer Vision.

[24] Nico Blodow,et al. Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[25] Pushmeet Kohli,et al. Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Thorsten Joachims,et al. Contextually Guided Semantic Labeling and Search for 3D Point Clouds , 2011, ArXiv.

[27] Paul Newman,et al. Fast Probabilistic Labeling of City Maps , 2008, Robotics: Science and Systems.

[28] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.