Rethinking Planar Homography Estimation Using Perspective Fields

Planar homography estimation refers to the problem of computing a bijective linear mapping of pixels between two images. While this problem has been studied with convolutional neural networks (CNNs), existing methods simply regress the location of the four corners using a dense layer preceded by a fully-connected layer. This vector representation damages the spatial structure of the corners since they have a clear spatial order. Moreover, four points are the minimum required to compute the homography, and so such an approach is susceptible to perturbation. In this paper, we propose a conceptually simple, reliable, and general framework for homography estimation. In contrast to previous works, we formulate this problem as a perspective field (PF), which models the essence of the homography - pixel-to-pixel bijection. The PF is naturally learned by the proposed fully convolutional residual network, PFNet, to keep the spatial order of each pixel. Moreover, since every pixels’ displacement can be obtained from the PF, it enables robust homography estimation by utilizing dense correspondences. Our experiments demonstrate the proposed method outperforms traditional correspondence-based approaches and state-of-the-art CNN approaches in terms of accuracy while also having a smaller network size. In addition, the new parameterization of this task is general and can be implemented by any fully convolutional network (FCN) architecture.

[1]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[2]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Jan Flusser,et al.  Image registration methods: a survey , 2003, Image Vis. Comput..

[5]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[6]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Yaser Sheikh,et al.  Deltille Grids for Geometric Camera Calibration , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Vishal M. Patel,et al.  Densely Connected Pyramid Dehazing Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Yi Zhu,et al.  DenseNet for dense flow , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[11]  Junjun Jiang,et al.  Robust Feature Matching for Remote Sensing Image Registration via Locally Linear Transforming , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[13]  Vijay Kumar,et al.  Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model , 2017, IEEE Robotics and Automation Letters.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Robert Laganière,et al.  Homography Estimation from Image Pairs with Hierarchical Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[16]  Maoguo Gong,et al.  A Novel Coarse-to-Fine Scheme for Automatic Image Registration Based on SIFT and Mutual Information , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[17]  Jiri Matas,et al.  Planar Affine Rectification from Change of Scale , 2010, ACCV.

[18]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[19]  Pascal Monasse,et al.  Three-step image rectification , 2010, BMVC.

[20]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[21]  Fabio Morbidi,et al.  Practical and accurate calibration of RGB-D cameras using spheres , 2015, Comput. Vis. Image Underst..

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[24]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.