Discrete-continuous optimization for large-scale structure from motion

Recent work in structure from motion (SfM) has successfully built 3D models from large unstructured collections of images downloaded from the Internet. Most approaches use incremental algorithms that solve progressively larger bundle adjustment problems. These incremental techniques scale poorly as the number of images grows, and can drift or fall into bad local minima. We present an alternative formulation for SfM based on finding a coarse initial solution using a hybrid discrete-continuous optimization, and then improving that solution using bundle adjustment. The initial optimization step uses a discrete Markov random field (MRF) formulation, coupled with a continuous Levenberg-Marquardt refinement. The formulation naturally incorporates various sources of information about both the cameras and the points, including noisy geotags and vanishing point estimates. We test our method on several large-scale photo collections, including one with measured camera positions, and show that it can produce models that are similar to or better than those produced with incremental bundle adjustment, but more robustly and in a fraction of the time.

[1]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[3]  Sunil Arya,et al.  Approximate nearest neighbor queries in fixed dimensions , 1993, SODA '93.

[4]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[5]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Venu Madhav Govindu,et al.  Combining two-view constraints for motion estimation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[8]  Richard Szeliski,et al.  Vision Algorithms: Theory and Practice , 2002, Lecture Notes in Computer Science.

[9]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[10]  Carsten Rother Linear multiview reconstruction of points, lines, planes and cameras using a reference plane , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[13]  Venu Madhav Govindu,et al.  Lie-algebraic averaging for globally consistent motion estimation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[16]  John W. Fisher,et al.  Nonparametric belief propagation for self-localization of sensor networks , 2005, IEEE Journal on Selected Areas in Communications.

[17]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[19]  Frank Dellaert,et al.  Square Root SAM: Simultaneous Localization and Mapping via Square Root Information Smoothing , 2006, Int. J. Robotics Res..

[20]  Frank Dellaert,et al.  Out-of-Core Bundle Adjustment for Large-Scale 3D Reconstruction , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Frank Dellaert,et al.  Loopy SAM , 2007, IJCAI.

[22]  Tomás Pajdla,et al.  Robust Rotation and Translation Estimation in Multiview Reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Richard J. Radke,et al.  Calibrating Distributed Camera Networks Using Belief Propagation , 2007, EURASIP J. Adv. Signal Process..

[24]  Daniel Moldovan,et al.  A NEW RELIABILITYMEASURE FOR ESSENTIAL MATRICES SUITABLE IN MULTIPLE VIEWCALIBRATION , 2008, VISAPP 2008.

[25]  Joachim Denzler,et al.  Global Uncertainty-based Selection of Relative Poses for Multi Camera Calibration , 2008, BMVC.

[26]  Carsten Rother,et al.  FusionFlow: Discrete-continuous optimization for optical flow estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Richard Szeliski,et al.  Interactive 3D architectural modeling from unordered photo collections , 2008, ACM Trans. Graph..

[28]  Richard I. Hartley,et al.  Multiple-View Geometry Under the {$L_\infty$}-Norm , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Daniel Moldovan,et al.  A New Reliability Measure for Essential Matrices Suitable in Multiple View Calibration , 2008, VISAPP.

[30]  Richard Szeliski,et al.  Skeletal graphs for efficient structure from motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  René Vidal,et al.  Distributed image-based 3-D localization of camera sensor networks , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[32]  Richard Szeliski,et al.  Alignment of 3D point clouds to overhead images , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[33]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34]  M. Dhome,et al.  Towards geographical referencing of monocular SLAM reconstruction using 3D city models: Application to real-time accurate vision-based localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Michel Dhome,et al.  Towards geographical referencing of monocular SLAM reconstruction using 3D city models: Application to real-time accurate vision-based localization , 2009, CVPR.

[36]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Andrea Fusiello,et al.  Improving the efficiency of hierarchical structure-and-motion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Pascal Fua,et al.  Dynamic and scalable large scale image reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Jan-Michael Frahm,et al.  Building Rome on a Cloudless Day , 2010, ECCV.

[40]  Richard Szeliski,et al.  Bundle Adjustment in the Large , 2010, ECCV.

[41]  Martin Byröd,et al.  Conjugate Gradient Bundle Adjustment , 2010, ECCV.

[42]  Richard Szeliski,et al.  A Multi-stage Linear Approach to Structure from Motion , 2010, ECCV Workshops.

[43]  Andrew Owens,et al.  Discrete-continuous optimization for large-scale structure from motion , 2011, CVPR.

[44]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[45]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.