论文信息 - Pixel-Perfect Structure-from-Motion with Featuremetric Refinement

Pixel-Perfect Structure-from-Motion with Featuremetric Refinement

Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this paper, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale. Our code is publicly available at github.com/cvg/pixel-perfect-sfm as an add-on to the popular SfM software COLMAP.

[1] M. Pollefeys,et al. Back to the Feature: Learning Robust Camera Localization from Pixels to Pose , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Pascal Fua,et al. Image Matching Across Wide Baselines: From Paper to Practice , 2020, International Journal of Computer Vision.

[3] Torsten Sattler,et al. Image Retrieval for Image-Based Localization Revisited , 2012, BMVC.

[4] D. Scaramuzza,et al. Reference Pose Generation for Long-term Visual Localization via Learned Features and View Synthesis , 2020, International Journal of Computer Vision.

[5] Binbin Xu,et al. Deep Probabilistic Feature-Metric Tracking , 2020, IEEE Robotics and Automation Letters.

[6] Hugo Germain,et al. S2DNet: Learning Accurate Correspondences for Sparse-to-Dense Feature Matching , 2020, ArXiv.

[7] James M. Rehg,et al. Taking a Deeper Look at the Inverse Compositional Algorithm , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Nan Yang,et al. LM-Reloc: Levenberg-Marquardt Based Direct Visual Relocalization , 2020, 2020 International Conference on 3D Vision (3DV).

[9] Jiri Matas,et al. Locally Optimized RANSAC , 2003, DAGM-Symposium.

[10] Jonathan M. Garibaldi,et al. Real-Time Correlation-Based Stereo Vision with Reduced Border Errors , 2002, International Journal of Computer Vision.

[11] Stefanos Zafeiriou,et al. Feature-Based Lucas–Kanade and Active Appearance Models , 2015, IEEE Transactions on Image Processing.

[12] Torsten Sattler,et al. A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Andrew W. Fitzgibbon,et al. Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[14] Henrik Karstoft,et al. UnsuperPoint: End-to-end Unsupervised Interest Point Detector and Descriptor , 2019, ArXiv.

[15] Jan-Michael Frahm,et al. From Dusk Till Dawn: Modeling in the Dark , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Werner A. Stahel,et al. Robust Statistics: The Approach Based on Influence Functions , 1987 .

[17] Long Quan,et al. Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Jiri Matas,et al. Efficient Initial Pose-graph Generation for Global SfM , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Xinghui Li,et al. Dual-Resolution Correspondence Networks , 2020, NeurIPS.

[20] Zhengqi Li,et al. MegaDepth: Learning Single-View Depth Prediction from Internet Photos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Edward Y. Chang,et al. CLKN: Cascaded Lucas-Kanade Networks for Image Alignment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Simon Baker,et al. Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[23] Nassir Navab,et al. A Unified Approach Combining Photometric and Geometric Information for Pose Estimation , 2008, BMVC.

[24] Alexei A. Efros,et al. RANSAC-Flow: generic two-stage image alignment , 2020, ECCV.

[25] Vincent Lepetit,et al. DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Torsten Sattler,et al. Patch2Pix: Epipolar-Guided Pixel-Level Correspondences , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Rares Ambrus,et al. Neural Outlier Rejection for Self-Supervised Keypoint Learning , 2019, ICLR.

[28] Vincent Lepetit,et al. LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[29] Vincent Lepetit,et al. Neural Reprojection Error: Merging Feature Learning and Camera Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Daniel Cremers,et al. Dense visual SLAM for RGB-D cameras , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[32] Marc Pollefeys,et al. Photometric Bundle Adjustment for Dense Multi-view 3D Modeling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Ping Tan,et al. BA-Net: Dense Bundle Adjustment Network , 2018, ICLR 2018.

[34] D. Cremers,et al. GN-Net: The Gauss-Newton Loss for Multi-Weather Relocalization , 2019, IEEE Robotics and Automation Letters.

[35] Zehao Yu,et al. Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Torsten Sattler,et al. Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37] Richard Szeliski,et al. Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[38] Venu Madhav Govindu,et al. Efficient and Robust Large-Scale Rotation Averaging , 2013, 2013 IEEE International Conference on Computer Vision.

[39] SuperGlue: Learning Feature Matching With Graph Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[41] Silvio Savarese,et al. Universal Correspondence Network , 2016, NIPS.

[42] Christopher Hunt,et al. Notes on the OpenSURF Library , 2009 .

[43] Pascal Fua,et al. LF-Net: Learning Local Features from Images , 2018, NeurIPS.

[44] Richard Szeliski,et al. Pushing the Envelope of Modern Methods for Bundle Adjustment , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45] Jiri Matas,et al. Two-view geometry estimation unaffected by a dominant plane , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46] Daniel Cremers,et al. Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47] Martin Danelljan,et al. GLU-Net: Global-Local Universal Network for Dense Flow and Correspondences , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Tomás Pajdla,et al. Robust Rotation and Translation Estimation in Multiview Reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Torsten Sattler,et al. D2-Net: A Trainable CNN for Joint Detection and Description of Local Features , 2019, CVPR 2019.

[50] Jan-Michael Frahm,et al. Building Rome on a Cloudless Day , 2010, ECCV.

[51] Josef Sivic,et al. Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions , 2020, ECCV.

[52] Daniel Barath,et al. Optimal Multi-view Correction of Local Affine Frames , 2019, BMVC.

[53] Marc Pollefeys,et al. Online Invariance Selection for Local Feature Descriptors , 2020, ECCV.

[54] Pascal Fua,et al. DISK: Learning local features with policy gradient , 2020, NeurIPS.

[55] Johannes L. Schönberger,et al. Multi-View Optimization of Local Feature Geometry , 2020, ECCV.

[56] Long Quan,et al. MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[57] Pascal Fua,et al. Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[58] P. Holland,et al. Robust regression using iteratively reweighted least-squares , 1977 .

[59] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[60] Richard Szeliski,et al. Bundle Adjustment in the Large , 2010, ECCV.

[61] Torsten Sattler,et al. BAD SLAM: Bundle Adjusted Direct RGB-D SLAM , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62] Gabriela Csurka,et al. R2D2: Repeatable and Reliable Detector and Descriptor , 2019, ArXiv.

[63] Stefan Leutenegger,et al. Semantic Texture for Robust Dense Tracking , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[64] Torsten Sattler,et al. InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65] Roland Siegwart,et al. From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66] H. Bischof,et al. From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[67] Richard Szeliski,et al. Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[68] Brett Browning,et al. Photometric Bundle Adjustment for Vision-Based SLAM , 2016, ACCV.

[69] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[70] Tomasz Malisiewicz,et al. Deep ChArUco: Dark ChArUco Marker Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71] Yuchao Dai,et al. Efficient Global 2 D-3 D Matching for Camera Localization in a Large-Scale 3 D Map , 2017 .

[72] Gérard G. Medioni,et al. Detection of Intensity Changes with Subpixel Accuracy Using Laplacian-Gaussian Masks , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73] Tomasz Malisiewicz,et al. SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[74] Stefan Leutenegger,et al. LS-Net: Learning to Solve Nonlinear Least Squares for Monocular Stereo , 2018, ECCV.

[75] Tomás Pajdla,et al. Neighbourhood Consensus Networks , 2018, NeurIPS.

[76] Antonio Torralba,et al. SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77] Kenneth Levenberg. A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[78] Hujun Bao,et al. LoFTR: Detector-Free Local Feature Matching with Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[79] Hongdong Li,et al. Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[80] Daniel Cremers,et al. LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[81] Dacheng Tao,et al. Heatmap Regression via Randomized Rounding , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[82] Jan-Michael Frahm,et al. Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[83] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[84] D. Scharstein,et al. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[85] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[86] Jan Kautz,et al. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[87] Olivier D. Faugeras,et al. Computing differential properties of 3-D shapes from stereoscopic images without 3-D models , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[88] Tom Drummond,et al. Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[89] Oliver J. Woodford,et al. Large Scale Photometric Bundle Adjustment , 2020, BMVC.

[90] Jan-Michael Frahm,et al. Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[91] Brett Browning,et al. Robust Tracking in Low Light and Sudden Illumination Changes , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[92] Marc Pollefeys,et al. Illumination change robustness in direct visual SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[93] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[94] Jan-Michael Frahm,et al. Reconstructing the World* in Six Days *(As Captured by the Yahoo 100 Million Image Dataset) , 2015, CVPR 2015.