MonoFusion: Real-time 3D reconstruction of small scenes with a single web camera

MonoFusion allows a user to build dense 3D reconstructions of their environment in real-time, utilizing only a single, off-the-shelf web camera as the input sensor. The camera could be one already available in a tablet, phone, or a standalone device. No additional input hardware is required. This removes the need for power intensive active sensors that do not work robustly in natural outdoor lighting. Using the input stream of the camera we first estimate the 6DoF camera pose using a sparse tracking method. These poses are then used for efficient dense stereo matching between the input frame and a key frame (extracted previously). The resulting dense depth maps are directly fused into a voxel-based implicit model (using a computationally inexpensive method) and surfaces are extracted per frame. The system is able to recover from tracking failures as well as filter out geometrically inconsistent noise from the 3D reconstruction. Our method is both simple to implement and efficient, making such systems even more accessible. This paper details the algorithmic components that make up our system and a GPU implementation of our approach. Qualitative results demonstrate high quality reconstructions even visually comparable to active depth sensor-based systems such as KinectFusion.

[1]  Selim Benhimane,et al.  Real-time image-based tracking of planes using efficient second-order minimization , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[2]  H. Hirschmüller Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information , 2005, CVPR.

[3]  Andrew J. Davison,et al.  Live dense reconstruction with a single moving camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Manolis I. A. Lourakis,et al.  SBA: A software package for generic sparse bundle adjustment , 2009, TOMS.

[5]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Luc Van Gool,et al.  In-hand scanning with online loop closure , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[7]  Philip L. Davidson,et al.  Real-time stereo vision using semi-global matching on programmable graphics hardware , 2006, SIGGRAPH '06.

[8]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[10]  Johannes Stallkamp,et al.  Real-time stereo vision: Optimizing Semi-Global Matching , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[11]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[12]  Hendrik P. A. Lensch,et al.  Scale Robust Multi View Stereo , 2012, ECCV.

[13]  Jan-Michael Frahm,et al.  Detailed Real-Time Urban 3D Reconstruction from Video , 2007, International Journal of Computer Vision.

[14]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[15]  Neil A. Dodgson,et al.  Real-Time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid , 2010, ECCV.

[16]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[17]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[18]  Jörg Stückler,et al.  Robust Real-Time Registration of RGB-D Images using Multi-Resolution Surfel Representations , 2012, ROBOTIK.

[19]  Dieter Fox,et al.  RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[20]  Carsten Rother,et al.  PatchMatch Stereo - Stereo Matching with Slanted Support Windows , 2011, BMVC.

[21]  Rudy Lauwereins,et al.  Joint integral histograms and its application in stereo matching , 2010, 2010 IEEE International Conference on Image Processing.

[22]  Horst Bischof,et al.  A Globally Optimal Algorithm for Robust TV-L1 Range Image Integration , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  C. Zach Fast and High Quality Fusion of Depth Maps , 2008 .

[24]  Marc Levoy,et al.  The digital Michelangelo project: 3D scanning of large statues , 2000, SIGGRAPH.

[25]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[26]  Hang-Bong Kang,et al.  Constant Time Stereo Matching , 2009, 2009 13th International Machine Vision and Image Processing Conference.

[27]  Reinhard Koch,et al.  Visual Modeling with a Hand-Held Camera , 2004, International Journal of Computer Vision.

[28]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[29]  Jan-Michael Frahm,et al.  Real-Time Visibility-Based Fusion of Depth Maps , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Ines Ernst,et al.  Mutual Information Based Semi-Global Stereo Matching on the GPU , 2008, ISVC.

[31]  Jennifer A. Scott,et al.  Algorithm 891: A Fortran virtual memory system , 2009, TOMS.

[32]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[33]  Wolfram Burgard,et al.  OctoMap: an efficient probabilistic 3D mapping framework based on octrees , 2013, Autonomous Robots.

[34]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[35]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[36]  Olga Veksler,et al.  Stereo correspondence by dynamic programming on a tree , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  Stefano Mattoccia,et al.  Linear stereo matching , 2011, 2011 International Conference on Computer Vision.

[38]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.