A robust RGB-D SLAM system for 3D environment with planar surfaces

With the increasing popularity of RGB-depth (RGB-D) sensors such as the Microsoft Kinect, there have been much research on capturing and reconstructing 3D environments using a movable RGB-D sensor. The key process behind these kinds of simultaneous location and mapping (SLAM) systems is the iterative closest point or ICP algorithm, which is an iterative algorithm that can estimate the rigid movement of the camera based on the captured 3D point clouds. While ICP is a well-studied algorithm, it is problematic when it is used in scanning large planar regions such as wall surfaces in a room. The lack of depth variations on planar surfaces makes the global alignment an ill-conditioned problem. In this paper, we present a novel approach for registering 3D point clouds by combining both color and depth information. Instead of directly searching for point correspondences among 3D data, the proposed method first extracts features from the RGB images, and then back-projects the features to the 3D space to identify more reliable correspondences. These color correspondences form the initial input to the ICP procedure which then proceeds to refine the alignment. Experimental results show that our proposed approach can achieve better accuracy than existing SLAMs in reconstructing indoor environments with large planar surfaces.

[1]  Avideh Zakhor,et al.  Indoor Localization Algorithms for a Human-Operated Backpack System , 2010 .

[2]  Qi Tian,et al.  Visual Synset: Towards a higher-level visual representation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Dieter Fox,et al.  RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[4]  Ligang Liu,et al.  Scanning 3D Full Human Bodies Using Kinects , 2012, IEEE Transactions on Visualization and Computer Graphics.

[5]  P. Newman,et al.  Navigating , Recognising and Describing Urban Spaces With Vision and Laser , 2009 .

[6]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[7]  Wolfram Burgard,et al.  Improving Simultaneous Mapping and Localization in 3D Using Global Constraints , 2005, AAAI.

[8]  Dieter Fox,et al.  Manipulator and object tracking for in-hand 3D object modeling , 2011, Int. J. Robotics Res..

[9]  John Amanatides,et al.  A Fast Voxel Traversal Algorithm for Ray Tracing , 1987, Eurographics.

[10]  André Crosnier,et al.  Pair-wise Registration of 3D/Color Data Sets with ICP , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Paul Newman,et al.  Navigating, Recognizing and Describing Urban Spaces With Vision and Lasers , 2009, Int. J. Robotics Res..

[12]  Avideh Zakhor,et al.  Automatic loop closure detection using multiple cameras for 3D indoor localization , 2012, Electronic Imaging.

[13]  Wolfram Burgard,et al.  Efficient estimation of accurate maximum likelihood maps in 3D , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[15]  Andrew J. Davison,et al.  Live dense reconstruction with a single moving camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Hans Martin Kjer,et al.  Evaluation of surface registration algorithms for PET motion correction , 2010 .

[17]  Javier Civera,et al.  Drift-Free Real-Time Sequential Mosaicing , 2009, International Journal of Computer Vision.

[18]  Dieter Fox,et al.  Interactive 3D modeling of indoor environments with a consumer depth camera , 2011, UbiComp '11.

[19]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[20]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[21]  Dieter Fox,et al.  RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments , 2012, Int. J. Robotics Res..

[22]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[23]  Martin D. Levine,et al.  Registering Multiview Range Data to Create 3D Computer Objects , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[25]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[26]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[27]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Patrick Rives,et al.  Real-time dense RGB-D localisation and mapping , 2011, IEEE International Conference on Robotics and Automation.

[29]  Wolfram Burgard,et al.  An evaluation of the RGB-D SLAM system , 2012, 2012 IEEE International Conference on Robotics and Automation.

[30]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[31]  Andrew J. Davison,et al.  Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[32]  Avideh Zakhor,et al.  Indoor localization and visualization using a human-operated backpack system , 2010, 2010 International Conference on Indoor Positioning and Indoor Navigation.

[33]  Patrick Rives,et al.  An asymmetric real-time dense visual localisation and mapping system , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[34]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[35]  Ju Shen,et al.  Virtual mirror by fusing multiple RGB-D cameras , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[36]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[37]  Gérard G. Medioni,et al.  Object modelling by registration of multiple range images , 1992, Image Vis. Comput..

[38]  Heinz Hügli,et al.  A multi-resolution ICP with heuristic closest point search for fast and robust 3D registration of range images , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[39]  Andrew J. Davison,et al.  Real-Time Spherical Mosaicing Using Whole Image Alignment , 2010, ECCV.

[40]  Kun Zhou,et al.  An interactive approach to semantic modeling of indoor scenes with an RGBD camera , 2012, ACM Trans. Graph..

[41]  Albert S. Huang,et al.  Visual Odometry and Mapping for Autonomous Flight Using an RGB-D Camera , 2011, ISRR.

[42]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Kok-Lim Low Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration , 2004 .

[44]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.