A Hybrid Approach to Wide-Baseline Image Matching

Recent works such as DEEPDESC, DEEPCOMPARE have proposed the learning of robust local image descriptors using a Siamese convolutional neural network directly from images instead of handcrafting them like traditional descriptors such as SIFT and MROGH. Though these algorithms show the state-of-the-art results on the Multi-View Stereo (MVS) dataset, they fail to accomplish many challenging real world tasks such as stitching image panoramas, primarily due to the limited performance of finding correspondence. In this paper, we propose a novel hybrid algorithm with which we are able to harness the power of a learning based approach along with the discriminative advantages that traditional descriptors have to offer. We also propose the PhotoSynth dataset, with size of an order of magnitude more that the traditional MVS dataset in terms of the number of scenes, images, patches along with positive and negative correspondence. Our PhotoSynth dataset also has better coverage of the overall viewpoint, scale, and lighting challenges than the MVS dataset. We evaluate our approach on two data sets which provides images having high viewpoints difference and wide-baselines. One of them is Graffiti scene from the Oxford Affine Covariant Regions Dataset (ACRD) for matching images with 2D affine transformations. The other is the Fountain-P11 dataset for images with 3D projective transformations. We report, to the best of our knowledge, the best results till date on the ACRD Graffiti scene compared to descriptors such as SIFT, MROGH or any other learnt descriptors such as DEEPDESC.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Matthew A. Brown,et al.  Picking the best DAISY , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Zhanyi Hu,et al.  Aggregating gradient distributions into intensity orders: A novel local image descriptor , 2011, CVPR 2011.

[6]  Steven M. Seitz,et al.  Multicore bundle adjustment , 2011, CVPR 2011.

[7]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Pascal Fua,et al.  On benchmarking camera calibration and multi-view stereo for high resolution imagery , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Matthew A. Brown,et al.  Automatic Panoramic Image Stitching using Invariant Features , 2007, International Journal of Computer Vision.

[11]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[12]  Hongtao Lu,et al.  SURF Tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..