Very Large-Scale Global SfM by Distributed Motion Averaging

Global Structure-from-Motion (SfM) techniques have demonstrated superior efficiency and accuracy than the conventional incremental approach in many recent studies. This work proposes a divide-and-conquer framework to solve very large global SfM at the scale of millions of images. Specifically, we first divide all images into multiple partitions that preserve strong data association for well-posed and parallel local motion averaging. Then, we solve a global motion averaging that determines cameras at partition boundaries and a similarity transformation per partition to register all cameras in a single coordinate frame. Finally, local and global motion averaging are iterated until convergence. Since local camera poses are fixed during the global motion average, we can avoid caching the whole reconstruction in memory at once. This distributed framework significantly enhances the efficiency and robustness of large-scale motion averaging.

[1]  Jan-Michael Frahm,et al.  Next Best View Planning for Active Model Improvement , 2009, BMVC.

[2]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Ping Tan,et al.  A Global Linear Method for Camera Pose Registration , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Richard I. Hartley,et al.  Multiple-View Geometry Under the {$L_\infty$}-Norm , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jan-Michael Frahm,et al.  Detailed Real-Time Urban 3D Reconstruction from Video , 2007, International Journal of Computer Vision.

[6]  Long Quan,et al.  Color Correction for Image-Based Modeling in the Large , 2016, ACCV.

[7]  Andrew Owens,et al.  Discrete-continuous optimization for large-scale structure from motion , 2011, CVPR.

[8]  Tobias Höllerer,et al.  Large Scale SfM with the Distributed Camera Model , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[9]  Carsten Rother,et al.  Multi-View Reconstruction and Camera Recovery using a Real or Virtual Reference Plane , 2003 .

[10]  Long Quan,et al.  Joint Camera Clustering and Surface Segmentation for Large-Scale Multi-view Stereo , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[12]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Anders P. Eriksson,et al.  A Consensus-Based Framework for Distributed Bundle Adjustment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[15]  Seth J. Teller,et al.  Spectral Solution of Large-Scale Extrinsic Camera Calibration as a Graph Embedding Problem , 2004, ECCV.

[16]  Venu Madhav Govindu,et al.  Lie-algebraic averaging for globally consistent motion estimation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[17]  Long Quan,et al.  Distributed Very Large Scale Bundle Adjustment by Global Camera Consensus , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Lei Zhou,et al.  Progressive Large Scale-Invariant Image Matching in Scale Space , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Zhaopeng Cui,et al.  Linear Global Translation Estimation with Feature Tracks , 2015, BMVC.

[20]  Anders Heyden,et al.  Covariance Propagation and Next Best View Planning for 3D Reconstruction , 2012, ECCV.

[21]  Jan-Michael Frahm,et al.  Building Rome on a Cloudless Day , 2010, ECCV.

[22]  Long Quan,et al.  Graph-Based Consistent Matching for Structure-from-Motion , 2016, ECCV.

[23]  Richard Szeliski,et al.  A Multi-stage Linear Approach to Structure from Motion , 2010, ECCV Workshops.

[24]  Ping Tan,et al.  Global Structure-from-Motion by Similarity Averaging , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Richard Szeliski,et al.  Towards Internet-scale multi-view stereo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Peter F. Sturm,et al.  Exploiting Loops in the Graph of Trifocal Tensors for Calibrating a Network of Cameras , 2010, ECCV.

[29]  Tobias Höllerer,et al.  Optimizing the Viewing Graph for Structure-from-Motion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Subhashis Banerjee,et al.  Divide and Conquer: Efficient Large-Scale Structure from Motion Using Graph Partitioning , 2014, ACCV.

[31]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[32]  Richard I. Hartley,et al.  Recovering Camera Motion Using L\infty Minimization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Venu Madhav Govindu,et al.  Combining two-view constraints for motion estimation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[34]  Richard Szeliski,et al.  Skeletal graphs for efficient structure from motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[36]  Onur Özyesil,et al.  Robust camera location estimation by convex programming , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Reinhard Koch,et al.  Visual Modeling with a Hand-Held Camera , 2004, International Journal of Computer Vision.

[38]  Noah Snavely,et al.  Robust Global Translations with 1DSfM , 2014, ECCV.

[39]  Ira Kemelmacher-Shlizerman,et al.  Global Motion Estimation from Point Matches , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[40]  D. Rose,et al.  Generalized nested dissection , 1977 .

[41]  Jianxiong Xiao,et al.  Local Readjustment for High-Resolution 3D Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Long Quan,et al.  Image-Based Building Regularization Using Structural Linear Features , 2016, IEEE Transactions on Visualization and Computer Graphics.

[43]  Tomás Pajdla,et al.  Robust Rotation and Translation Estimation in Multiview Reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Long Quan,et al.  Parallel Structure from Motion from Local Increment to Global Averaging , 2017 .

[46]  Long Quan,et al.  Multi-view Geometry Compression , 2014, ACCV.

[47]  Frank Dellaert,et al.  Initialization techniques for 3D SLAM: A survey on rotation estimation and its use in pose graph optimization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Roland Siegwart,et al.  A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation , 2011, CVPR 2011.

[49]  Loong Fah Cheong,et al.  Seeing double without confusion: Structure-from-motion in highly ambiguous scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Horst Bischof,et al.  What can missing correspondences tell us about 3D structure and motion? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Hongdong Li,et al.  Rotation Averaging , 2013, International Journal of Computer Vision.

[52]  Long Quan,et al.  Relative Camera Refinement for Accurate Dense Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[53]  Noah Snavely,et al.  When is Rotations Averaging Hard? , 2016, ECCV.

[54]  Pascal Monasse,et al.  Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion , 2013, ICCV.