Building 3D semantic maps for mobile robots using RGB-D camera

The wide availability of affordable RGB-D sensors changes the landscape of indoor scene analysis. Years of research on simultaneous localization and mapping (SLAM) have made it possible to merge multiple RGB-D images into a single point cloud and provide a 3D model for a complete indoor scene. However, these reconstructed models only have geometry information, not including semantic knowledge. The advancements in robot autonomy and capabilities for carrying out more complex tasks in unstructured environments can be greatly enhanced by endowing environment models with semantic knowledge. Towards this goal, we propose a novel approach to generate 3D semantic maps for an indoor scene. Our approach creates a 3D reconstructed map from a RGB-D image sequence firstly, then we jointly infer the semantic object category and structural class for each point of the global map. 12 object categories (e.g. walls, tables, chairs) and 4 structural classes (ground, structure, furniture and props) are labeled in the global map. In this way, we can totally understand both the object and structure information. In order to get semantic information, we compute semantic segmentation for each RGB-D image and merge the labeling results by a Dense Conditional Random Field. Different from previous techniques, we use temporal information and higher-order cliques to enforce the label consistency for each image labeling result. Our experiments demonstrate that temporal information and higher-order cliques are significant for the semantic mapping procedure and can improve the precision of the semantic mapping results.

[1]  Joachim Hertzberg,et al.  Towards semantic maps for mobile robots , 2008, Robotics Auton. Syst..

[2]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Xiaoping Chen,et al.  Semantic mapping for object category and structural class , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[5]  Wolfram Burgard,et al.  Efficient estimation of accurate maximum likelihood maps in 3D , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  John J. Leonard,et al.  Kintinuous: Spatially Extended KinectFusion , 2012, AAAI 2012.

[7]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[8]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[9]  Yingfeng Chen,et al.  A probabilistic, variable-resolution and effective quadtree representation for mapping of large environments , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[10]  Wolfram Burgard,et al.  Real-time 3D visual SLAM with a hand-held camera , 2011 .

[11]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[12]  Dieter Fox,et al.  RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments , 2012, Int. J. Robotics Res..

[13]  Bastian Leibe,et al.  Dense 3D semantic mapping of indoor scenes from RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Cristian Sminchisescu,et al.  CPMC-3D-O2P: Semantic segmentation of RGB-D images using CPMC and Second Order Pooling , 2013, ArXiv.

[15]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[16]  Jörg Stückler,et al.  Semantic mapping using object-class segmentation of RGB-D images , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Ali Shahrokni,et al.  Mesh Based Semantic Modelling for Indoor and Outdoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  W. Burgard,et al.  Real-time 3 D visual SLAM with a hand-held RGB-D camera , 2011 .

[19]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[21]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[22]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[23]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[26]  Chenliang Xu,et al.  Streaming Hierarchical Video Segmentation , 2012, ECCV.

[27]  Dieter Fox,et al.  RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.