Label Propagation for Large Scale 3D Indoor Scenes

RGB-D mapping or semantic mapping is becoming more and more important for computer vision and robotics. However, manually segmenting and generating semantic labels for RGB-D image sequence or global point cloud will cost a lot of human labors. That is why there still lacks a satisfactory indoor dataset for testing semantic mapping system. While automatic label propagation can help, almost all existing methods were designed for 2D videos which ignore the 3D characteristics of RGB-D images. In this paper, we build a global map for RGB-D image sequence firstly, and then propagate labels on the global map. In this way, we can enforce label consistency over the global scene and require fewer frames to be manually labeled. Also we model the overlap information between images and use a greedy algorithm to automatically choose frames for manual labeling. Experiments demonstrate that our method can reduce manual efforts greatly. For a scene which contains 1831 images, only 22 labeled images can achieve 93 % accuracy for label propagation.

[1]  M. Hebert,et al.  Efficient temporal consistency for streaming video scene analysis , 2013, 2013 IEEE International Conference on Robotics and Automation.

[2]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[4]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[5]  Wolfram Burgard,et al.  Real-time 3D visual SLAM with a hand-held camera , 2011 .

[6]  Joachim Hertzberg,et al.  Towards semantic maps for mobile robots , 2008, Robotics Auton. Syst..

[7]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[9]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[10]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[12]  Bastian Leibe,et al.  Joint 2D-3D temporally consistent semantic segmentation of street scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jörg Stückler,et al.  Semantic mapping using object-class segmentation of RGB-D images , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[15]  Ali Shahrokni,et al.  Mesh Based Semantic Modelling for Indoor and Outdoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[17]  John J. Leonard,et al.  Kintinuous: Spatially Extended KinectFusion , 2012, AAAI 2012.

[18]  Kristen Grauman,et al.  Active Frame Selection for Label Propagation in Videos , 2012, ECCV.

[19]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[20]  Roberto Cipolla,et al.  Assisted Video Object Labeling By Joint Tracking of Regions and Keypoints , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Bastian Leibe,et al.  Dense 3D semantic mapping of indoor scenes from RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Cristian Sminchisescu,et al.  CPMC-3D-O2P: Semantic segmentation of RGB-D images using CPMC and Second Order Pooling , 2013, ArXiv.

[24]  Dieter Fox,et al.  RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments , 2012, Int. J. Robotics Res..