Efficient Initial Pose-graph Generation for Global SfM

We propose ways to speed up the initial pose-graph generation for global Structure-from-Motion algorithms. To avoid forming tentative point correspondences by FLANN and geometric verification by RANSAC, which are the most time-consuming steps of the pose-graph creation, we propose two new methods - built on the fact that image pairs usually are matched consecutively. Thus, candidate relative poses can be recovered from paths in the partly-built pose-graph. We propose a heuristic for the A* traversal, considering global similarity of images and the quality of the pose-graph edges. Given a relative pose from a path, descriptor-based feature matching is made "light-weight" by exploiting the known epipolar geometry. To speed up PROSAC-based sampling when RANSAC is applied, we propose a third method to order the correspondences by their inlier probabilities from previous estimations. The algorithms are tested on 402130 image pairs from the 1DSfM dataset and they speed up the feature matching 17 times and pose estimation 5 times.

[1]  David Nister,et al.  Recent developments on direct relative orientation , 2006 .

[2]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[3]  Carsten Rother,et al.  Multi-View Reconstruction and Camera Recovery using a Real or Virtual Reference Plane , 2003 .

[4]  Ira Kemelmacher-Shlizerman,et al.  Global Motion Estimation from Point Matches , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[5]  Jiri Matas,et al.  MAGSAC++, a Fast, Reliable and Accurate Robust Estimator , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jan-Michael Frahm,et al.  A Vote-and-Verify Strategy for Fast Spatial Verification in Image Retrieval , 2016, ACCV.

[8]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[9]  Vincent Lepetit,et al.  Learning Image Descriptors with Boosting , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Frank Dellaert,et al.  Initialization techniques for 3D SLAM: A survey on rotation estimation and its use in pose graph optimization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Pascal Monasse,et al.  Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion , 2013, ICCV.

[12]  Seth J. Teller,et al.  Spectral Solution of Large-Scale Extrinsic Camera Calibration as a Graph Embedding Problem , 2004, ECCV.

[13]  Yannis Avrithis,et al.  Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jiri Matas,et al.  MAGSAC: Marginalizing Sample Consensus , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Andrea Fusiello,et al.  On Computing the Translations Norm in the Epipolar Graph , 2015, 2015 International Conference on 3D Vision.

[17]  Giorgos Tolias,et al.  Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[19]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[20]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[21]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[24]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[25]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[26]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[27]  Noah Snavely,et al.  Robust Global Translations with 1DSfM , 2014, ECCV.

[28]  Jan-Michael Frahm,et al.  Detailed Real-Time Urban 3D Reconstruction from Video , 2007, International Journal of Computer Vision.

[29]  Jiri Matas,et al.  Optimal Randomized RANSAC , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jiri Matas,et al.  Graph-Cut RANSAC , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Tomás Pajdla,et al.  Robust Rotation and Translation Estimation in Multiview Reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Pascal Fua,et al.  Image Matching Across Wide Baselines: From Paper to Practice , 2020, International Journal of Computer Vision.

[33]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[35]  Tobias Weyand,et al.  Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[38]  Hongdong Li,et al.  Rotation Averaging , 2013, International Journal of Computer Vision.

[39]  Venu Madhav Govindu,et al.  Lie-algebraic averaging for globally consistent motion estimation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[40]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[41]  Lei Zhou,et al.  Very Large-Scale Global SfM by Distributed Motion Averaging , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  P. J. Narayanan,et al.  Geometry-Aware Feature Matching for Structure from Motion Applications , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[46]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[48]  Venu Madhav Govindu,et al.  Efficient and Robust Large-Scale Rotation Averaging , 2013, 2013 IEEE International Conference on Computer Vision.

[49]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[50]  Onur Özyesil,et al.  Robust camera location estimation by convex programming , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Reinhard Koch,et al.  Visual Modeling with a Hand-Held Camera , 2004, International Journal of Computer Vision.

[52]  Michael J. Fischer,et al.  An improved equivalence algorithm , 1964, CACM.

[53]  Slawomir J. Nasuto,et al.  NAPSAC: High Noise, High Dimensional Robust Estimation - it's in the Bag , 2002, BMVC.

[54]  Ping Tan,et al.  Global Structure-from-Motion by Similarity Averaging , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[55]  Venu Madhav Govindu,et al.  Combining two-view constraints for motion estimation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.