Deep Fundamental Matrix Estimation without Correspondences

Estimating fundamental matrices is a classic problem in computer vision. Traditional methods rely heavily on the correctness of estimated key-point correspondences, which can be noisy and unreliable. As a result, it is difficult for these methods to handle image pairs with large occlusion or significantly different camera poses. In this paper, we propose novel neural network architectures to estimate fundamental matrices in an end-to-end manner without relying on point correspondences. New modules and layers are introduced in order to preserve mathematical properties of the fundamental matrix as a homogeneous rank-2 matrix with seven degrees of freedom. We analyze performance of the proposed models using various metrics on the KITTI dataset, and show that they achieve competitive performance with traditional methods without the need for extracting correspondences.

[1]  John E. Hopcroft,et al.  Stacked Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Guifeng Zhang,et al.  Fundamental Matrix Estimation Based on Improved Genetic Algorithm , 2016, 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC).

[4]  H. C. Longuet-Higgins,et al.  A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[5]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Esa Rahtu,et al.  Relative Camera Pose Estimation Using Convolutional Neural Networks , 2017, ACIVS.

[7]  Richard I. Hartley,et al.  In Defense of the Eight-Point Algorithm , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[9]  Zhengyou Zhang,et al.  Determining the Epipolar Geometry and its Uncertainty: A Review , 1998, International Journal of Computer Vision.

[10]  Lu Fang,et al.  SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Robert Laganière,et al.  Homography Estimation from Image Pairs with Hierarchical Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[12]  Juho Kannala,et al.  Camera Relocalization by Predicting Pairwise Relative Poses Using Convolutional Neural Network , 2017 .

[13]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  John B. Moore,et al.  Essential Matrix Estimation Using Gauss-Newton Iterations on a Manifold , 2007, International Journal of Computer Vision.

[15]  Serge J. Belongie,et al.  Vision-based real estate price estimation , 2017, Machine Vision and Applications.

[16]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[17]  Juho Kannala,et al.  Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[20]  Mohamed F. Tolba,et al.  Fundamental matrix estimation: A study of error criteria , 2017, Pattern Recognit. Lett..

[21]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[22]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Fan Zhou,et al.  Method for fundamental matrix estimation combined with feature lines , 2015, Neurocomputing.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Feng Liu,et al.  Fundamental matrix estimation for binocular vision measuring system used in wild field , 2014, Other Conferences.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Isay Katsman,et al.  Generative Adversarial Perturbations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Michael Rotkowitz,et al.  Essential Matrix Estimation Using Adaptive Penalty Formulations , 2014, BMVC.

[29]  Philip H. S. Torr,et al.  Bayesian Model Estimation and Selection for Epipolar Geometry and Generic Manifold Fitting , 2002, International Journal of Computer Vision.

[30]  Josef Sivic,et al.  Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[32]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[34]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[37]  Connor Greenwell,et al.  DEEPFOCAL: A method for direct focal length estimation , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[38]  Xavier Armangué,et al.  Overall view regarding fundamental matrix estimation , 2003, Image Vis. Comput..

[39]  Tomasz Malisiewicz,et al.  Deep Image Homography Estimation , 2016, ArXiv.

[40]  Bolei Zhou,et al.  Places: An Image Database for Deep Scene Understanding , 2016, ArXiv.

[41]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[42]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[43]  Jean-François Lalonde,et al.  Deep 6-DOF Tracking , 2017, IEEE Transactions on Visualization and Computer Graphics.

[44]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Vijay Kumar,et al.  Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model , 2017, IEEE Robotics and Automation Letters.

[46]  Tomasz Malisiewicz,et al.  Toward Geometric Deep SLAM , 2017, ArXiv.

[47]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[50]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..