Live dense reconstruction with a single moving camera

We present a method which enables rapid and dense reconstruction of scenes browsed by a single live camera. We take point-based real-time structure from motion (SFM) as our starting point, generating accurate 3D camera pose estimates and a sparse point cloud. Our main novel contribution is to use an approximate but smooth base mesh generated from the SFM to predict the view at a bundle of poses around automatically selected reference frames spanning the scene, and then warp the base mesh into highly accurate depth maps based on view-predictive optical flow and a constrained scene flow update. The quality of the resulting depth maps means that a convincing global scene model can be obtained simply by placing them side by side and removing overlapping regions. We show that a cluttered indoor environment can be reconstructed from a live hand-held camera in a few seconds, with all processing performed by current desktop hardware. Real-time monocular dense reconstruction opens up many application areas, and we demonstrate both real-time novel view synthesis and advanced augmented reality where augmentations interact physically with the 3D scene and are correctly clipped by occlusions.

[1]  Marc Levoy,et al.  Zippered polygon meshes from range images , 1994, SIGGRAPH.

[2]  Jules Bloomenthal,et al.  An Implicit Surface Polygonizer , 1994, Graphics Gems.

[3]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[4]  James F. O'Brien,et al.  Variational Implicit Surfaces , 1999 .

[5]  Andrew J. Davison,et al.  Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Hans-Peter Seidel,et al.  A multi-scale approach to 3D scattered data interpolation with compactly supported basis functions , 2003, 2003 Shape Modeling International..

[7]  Takeo Kanade,et al.  Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[9]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Jan-Michael Frahm,et al.  Real-Time Visibility-Based Fusion of Depth Maps , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[12]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[13]  Jan-Michael Frahm,et al.  Detailed Real-Time Urban 3D Reconstruction from Video , 2007, International Journal of Computer Vision.

[14]  Horst Bischof,et al.  A Globally Optimal Algorithm for Robust TV-L1 Range Image Integration , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Daniel Cremers,et al.  An Improved Algorithm for TV-L 1 Optical Flow , 2009, Statistical and Geometrical Approaches to Visual Motion Analysis.

[16]  H. Bischof,et al.  Continuous Globally Optimal Image Segmentation with Local Constraints , 2008 .

[17]  Roberto Cipolla,et al.  Reconstructing relief surfaces , 2008, Image and Vision Computing.

[18]  C. Zach Fast and High Quality Fusion of Depth Maps , 2008 .

[19]  Tom Drummond,et al.  ProFORMA: Probabilistic Feature-based On-line Rapid Model Acquisition , 2009, BMVC.