DGC-Net: Dense Geometric Correspondence Network

This paper addresses the challenge of dense pixel correspondence estimation between two images. This problem is closely related to optical flow estimation task where ConvNets (CNNs) have recently achieved significant progress. While optical flow methods produce very accurate results for the small pixel translation and limited appearance variation scenarios, they hardly deal with the strong geometric transformations that we consider in this work. In this paper, we propose a coarse-to-fine CNN-based framework that can leverage the advantages of optical flow approaches and extend them to the case of large transformations providing dense and subpixel accurate estimates. It is trained on synthetic transformations and demonstrates very good performance to unseen, realistic, data. Further, we apply our method to the problem of relative camera pose estimation and demonstrate that the model outperforms existing dense approaches.

[1]  Michael J. Black,et al.  Supplementary Material for Unsupervised Learning of Multi-Frame Optical Flow with Occlusions , 2018 .

[2]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[3]  Esa Rahtu,et al.  Image Patch Matching Using Convolutional Descriptors with Euclidean Distance , 2016, ACCV Workshops.

[4]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Tomasz Malisiewicz,et al.  Deep Image Homography Estimation , 2016, ArXiv.

[9]  Jean Ponce,et al.  SCNet: Learning Semantic Correspondence , 2017, ICCV.

[10]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[12]  Anders Bjorholm Dahl,et al.  Large-Scale Data for Multiple-View Stereopsis , 2016, International Journal of Computer Vision.

[13]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[14]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ersin Yumer,et al.  ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Alexey Shvets,et al.  TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation , 2018, Computer-Aided Analysis of Gastrointestinal Videos.

[18]  Torsten Sattler,et al.  Semantic Visual Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Josef Sivic,et al.  Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[21]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Simon Fuhrmann,et al.  MVE - A Multi-View Reconstruction Environment , 2014, GCH.

[23]  Cordelia Schmid,et al.  DeepMatching: Hierarchical Deformable Dense Matching , 2015, International Journal of Computer Vision.

[24]  Simon Fuhrmann,et al.  MVE-A Multiview Reconstruction Environment , 2014 .

[25]  Gang Hua,et al.  Visual attribute transfer through deep image analogy , 2017, ACM Trans. Graph..

[26]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[27]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[28]  Minh N. Do,et al.  Bilateral Functions for Global Motion Modeling , 2014, ECCV.

[29]  Esa Rahtu,et al.  Relative Camera Pose Estimation Using Convolutional Neural Networks , 2017, ACIVS.

[30]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[32]  Jiri Matas,et al.  MODS: Fast and robust method for two-view matching , 2015, Comput. Vis. Image Underst..

[33]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Juho Kannala,et al.  Semi-supervised Semantic Matching , 2018, ECCV Workshops.

[35]  Paul Vernaza,et al.  Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences , 2018, ECCV.

[36]  Krystian Mikolajczyk,et al.  PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors , 2016, ArXiv.

[37]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[38]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Torsten Sattler,et al.  Semantic Match Consistency for Long-Term Visual Localization , 2018, ECCV.

[40]  Torsten Sattler,et al.  InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).