Efficient Dense Reconstruction from Video

We present a framework for efficient reconstruction of dense scene structure from video. Sequential structure-from-motion recovers camera information from video, providing only sparse 3D points. We build a dense 3D point cloud by performing full-frame tracking and depth estimation across sequences. First, we present a novel algorithm for sequential frame selection to extract a set of key frames with sufficient parallax for accurate depth reconstruction. Second, we introduce a technique for efficient reconstruction using dense tracking with geometrically correct optimisation of depth and orientation. Key frame selection is also performed in optimisation to provide accurate depth reconstruction for different scene elements. We test our work on benchmark footage and scenes containing local non-rigid motion, foreground clutter and occlusions to show comparable performance to state of the art techniques. We also show a substantial increase in speed on real world footage compared to existing methods, when they succeed, and successful reconstructions when they fail.

[1]  Jan-Michael Frahm,et al.  Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Roberto Cipolla,et al.  Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo , 2008, ECCV.

[5]  Jan-Michael Frahm,et al.  Piecewise planar and non-planar stereo for urban scene reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[8]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Andrew J. Davison,et al.  Live dense reconstruction with a single moving camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Michael Goesele,et al.  Multi-View Stereo Revisited , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[12]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[13]  Jan-Michael Frahm,et al.  Building Rome on a Cloudless Day , 2010, ECCV.

[14]  Richard Szeliski,et al.  Manhattan-world stereo , 2009, CVPR.

[15]  Reinhard Koch,et al.  A simple and efficient rectification method for general motion , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[16]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[17]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[18]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.