RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments

RGB-D cameras (such as the Microsoft Kinect) are novel sensing systems that capture RGB images along with per-pixel depth information. In this paper we investigate how such cameras can be used for building dense 3D maps of indoor environments. Such maps have applications in robot navigation, manipulation, semantic mapping, and telepresence. We present RGB-D Mapping, a full 3D mapping system that utilizes a novel joint optimization algorithm combining visual features and shape-based alignment. Visual and depth information are also combined for view-based loop-closure detection, followed by pose optimization to achieve globally consistent maps. We evaluate RGB-D Mapping on two large indoor environments, and show that it effectively combines the visual and shape information available from RGB-D cameras.

[1]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[2]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[3]  Gérard G. Medioni,et al.  Object modeling by registration of multiple range images , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[4]  Gérard G. Medioni,et al.  Object modelling by registration of multiple range images , 1992, Image Vis. Comput..

[5]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[7]  Evangelos E. Milios,et al.  Globally Consistent Range Scan Alignment for Environment Mapping , 1997, Auton. Robots.

[8]  Andrew E. Johnson,et al.  Registration and integration of textured 3-D data , 1997, Proceedings. International Conference on Recent Advances in 3-D Digital Imaging and Modeling (Cat. No.97TB100134).

[9]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[10]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[11]  Matthias Zwicker,et al.  Surfels: surface elements as rendering primitives , 2000, SIGGRAPH.

[12]  Wolfram Burgard,et al.  A real-time algorithm for mobile robot mapping with applications to multi-robot and 3D mapping , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[13]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[14]  Sang Wook Lee,et al.  ICP Registration Using Invariant Features , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[16]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  James R. Bergen,et al.  Visual odometry , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[18]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[19]  Kurt Konolige,et al.  Large-Scale Map-Making , 2004, AAAI.

[20]  Wolfram Burgard,et al.  Improving Simultaneous Mapping and Localization in 3D Using Global Constraints , 2005, AAAI.

[21]  Jan-Michael Frahm,et al.  Towards Urban 3D Reconstruction from Video , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[22]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[24]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[25]  Hugh F. Durrant-Whyte,et al.  CRF-Matching: Conditional Random Fields for Feature-Based Scan Matching , 2007, Robotics: Science and Systems.

[26]  Wolfram Burgard,et al.  A Tree Parameterization for Efficiently Computing Maximum Likelihood Maps using Gradient Descent , 2007, Robotics: Science and Systems.

[27]  Changchang Wu,et al.  SiftGPU : A GPU Implementation of Scale Invariant Feature Transform (SIFT) , 2007 .

[28]  Wolfram Burgard,et al.  Efficient estimation of accurate maximum likelihood maps in 3D , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Jan-Michael Frahm,et al.  Detailed Real-Time Urban 3D Reconstruction from Video , 2007, International Journal of Computer Vision.

[30]  Ian D. Reid,et al.  Mapping Large Loops with a Single Hand-Held Camera , 2007, Robotics: Science and Systems.

[31]  Reinhard Koch,et al.  Pose estimation and map building with a Time-Of-Flight-camera for robot navigation , 2008, Int. J. Intell. Syst. Technol. Appl..

[32]  Piotr Jasiobedzki,et al.  Stereo-Vision Based 3D Modeling and Localization for Unmanned Vehicles , 2008 .

[33]  Vincent Lepetit,et al.  Keypoint Signatures for Fast Learning and Recognition , 2008, ECCV.

[34]  Kurt Konolige,et al.  FrameSLAM: From Bundle Adjustment to Real-Time Visual Mapping , 2008, IEEE Transactions on Robotics.

[35]  Sven Behnke,et al.  Robust Ego-Motion Estimation with ToF Cameras , 2009, ECMR.

[36]  P. Newman,et al.  Navigating , Recognising and Describing Urban Spaces With Vision and Laser , 2009 .

[37]  Manolis I. A. Lourakis,et al.  SBA: A software package for generic sparse bundle adjustment , 2009, TOMS.

[38]  Young Min Kim,et al.  Multi-view image and ToF sensor fusion for dense 3D reconstruction , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[39]  Paul Newman,et al.  Navigating, Recognizing and Describing Urban Spaces With Vision and Lasers , 2009, Int. J. Robotics Res..

[40]  Aleksandr V. Segal,et al.  Generalized-ICP , 2009, Robotics: Science and Systems.

[41]  Richard Szeliski,et al.  Reconstructing building interiors from images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  Darius Burschka,et al.  The self-referenced DLR 3D-modeler , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  Joachim Hertzberg,et al.  Three‐dimensional mapping with time‐of‐flight cameras , 2009, J. Field Robotics.

[44]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Kurt Konolige,et al.  Sparse Sparse Bundle Adjustment , 2010, BMVC.

[46]  Andrew J. Davison,et al.  Live dense reconstruction with a single moving camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Dieter Fox,et al.  RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[48]  Sebastian Thrun,et al.  3D shape scanning with a time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Achim J. Lilienthal,et al.  6D scan registration using depth-interpolated local image features , 2010, Robotics Auton. Syst..

[50]  Kurt Konolige,et al.  Projected texture stereo , 2010, 2010 IEEE International Conference on Robotics and Automation.

[51]  Dieter Fox,et al.  Interactive 3D modeling of indoor environments with a consumer depth camera , 2011, UbiComp '11.

[52]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[53]  Wolfram Burgard,et al.  Highly accurate maximum likelihood laser mapping by jointly optimizing laser points and robot poses , 2011, 2011 IEEE International Conference on Robotics and Automation.

[54]  Dieter Fox,et al.  Manipulator and object tracking for in-hand 3D object modeling , 2011, Int. J. Robotics Res..