Learning Discriminative Affine Regions via Discriminability

We present an accurate method for estimation of the affine shape of local features. The method is trained in a novel way, exploiting the recently proposed HardNet triplet loss. The loss function is driven by patch descriptor differences, avoiding problems with symmetries. Moreover, such training process does not require precisely geometrically aligned patches. The affine shape is represented in a way amenable to learning by stochastic gradient descent. When plugged into a state-of-the-art wide baseline matching algorithm, the performance on standard datasets improves in both the number of challenging pairs matched and the number of inliers. Finally, AffNet with combination of Hessian detector and HardNet descriptor improves bag-of-visual-words based state of the art on Oxford5k and Paris6k by large margin, 4.5 and 4.2 mAP points respectively. The source code and trained networks are available at this https URL

[1]  Yannis Avrithis,et al.  Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[3]  C. Lawrence Zitnick,et al.  Edge foci interest points , 2011, 2011 International Conference on Computer Vision.

[4]  Matthew A. Brown,et al.  Automatic Panoramic Image Stitching using Invariant Features , 2007, International Journal of Computer Vision.

[5]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Rama Chellappa,et al.  Designing Deep Convolutional Neural Networks for Continuous Object Orientation Estimation , 2017, ArXiv.

[7]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Giorgos Tolias,et al.  Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10]  Cordelia Schmid,et al.  Accurate Image Search Using the Contextual Dissimilarity Measure , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jan-Michael Frahm,et al.  From single image query to detailed 3D reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Tinne Tuytelaars,et al.  Location recognition over large time lags , 2014, Comput. Vis. Image Underst..

[14]  Chia-Ling Tsai,et al.  Registration of Challenging Image Pairs: Initialization, Estimation, and Decision , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[16]  Jan-Michael Frahm,et al.  From Dusk Till Dawn: Modeling in the Dark , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[18]  Konrad Schindler,et al.  Predicting Matchability , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[20]  Torsten Sattler,et al.  Comparative Evaluation of Hand-Crafted and Learned Local Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[22]  Yannis Avrithis,et al.  Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jiri Matas,et al.  WxBS: Wide Baseline Stereo Generalizations , 2015, BMVC.

[24]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[25]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[26]  Vincent Lepetit,et al.  Learning to Assign Orientations to Feature Points , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Vincent Lepetit,et al.  TILDE: A Temporally Invariant Learned DEtector , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Charles V. Stewart,et al.  Keypoint Descriptors for Matching Across Multiple Image Modalities and Non-linear Intensity Variations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[32]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Andrea Vedaldi,et al.  Learning Covariant Feature Detectors , 2016, ECCV Workshops.

[34]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[37]  Jiri Matas,et al.  Systematic evaluation of convolution neural network advances on the Imagenet , 2017, Comput. Vis. Image Underst..

[38]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[39]  Noah Snavely,et al.  Image matching using local symmetry features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[41]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[42]  Shih-Fu Chang,et al.  Learning Discriminative and Transformation Covariant Local Feature Detectors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Zuzana Kukelova,et al.  Radially-Distorted Conjugate Translations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Jiri Matas,et al.  MODS: Fast and robust method for two-view matching , 2015, Comput. Vis. Image Underst..

[49]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).