论文信息 - Repeatability Is Not Enough: Learning Affine Regions via Discriminability

Repeatability Is Not Enough: Learning Affine Regions via Discriminability

A method for learning local affine-covariant regions is presented. We show that maximizing geometric repeatability does not lead to local regions, a.k.a features, that are reliably matched and this necessitates descriptor-based learning. We explore factors that influence such learning and registration: the loss function, descriptor type, geometric parametrization and the trade-off between matchability and geometric accuracy and propose a novel hard negative-constant loss function for learning of affine regions. The affine shape estimator – AffNet – trained with the hard negative-constant loss outperforms the state-of-the-art in bag-of-words image retrieval and wide baseline stereo. The proposed training process does not require precisely geometrically aligned patches. The source codes and trained weights are available at https://github.com/ducha-aiki/affnet.

[1] Jiri Matas,et al. WxBS: Wide Baseline Stereo Generalizations , 2015, BMVC.

[2] Nikos Komodakis,et al. Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Jan-Michael Frahm,et al. Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Vincent Lepetit,et al. LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[5] Torsten Sattler,et al. Quad-Networks: Unsupervised Learning to Rank for Interest Point Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] David G. Lowe,et al. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[7] Silvio Savarese,et al. Universal Correspondence Network , 2016, NIPS.

[8] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[9] Vincent Lepetit,et al. Learning to Assign Orientations to Feature Points , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Yannis Avrithis,et al. Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13] Jiri Matas,et al. Systematic evaluation of convolution neural network advances on the Imagenet , 2017, Comput. Vis. Image Underst..

[14] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15] Hervé Jégou,et al. Visual query expansion with or without geometry: Refining local descriptors by feature aggregation , 2014, Pattern Recognit..

[16] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[17] C. Schmid,et al. On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Torsten Sattler,et al. Benchmarking 6DOF Urban Visual Localization in Changing Conditions , 2017, ArXiv.

[19] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[20] Shih-Fu Chang,et al. Learning Discriminative and Transformation Covariant Local Feature Detectors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Bin Fan,et al. L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Bohyung Han,et al. Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[23] Zuzana Kukelova,et al. Radially-Distorted Conjugate Translations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24] Jiri Matas,et al. Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[25] Andrea Vedaldi,et al. HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Rama Chellappa,et al. Designing Deep Convolutional Neural Networks for Continuous Object Orientation Estimation , 2017, ArXiv.

[27] Thomas Brox,et al. Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Noah Snavely,et al. Image matching using local symmetry features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[30] Krystian Mikolajczyk,et al. Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[31] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[32] Shih-Fu Chang,et al. Learning Spread-Out Local Feature Descriptors , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33] Konrad Schindler,et al. Predicting Matchability , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Giorgos Tolias,et al. Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Torsten Sattler,et al. Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36] Cordelia Schmid,et al. A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[37] Chia-Ling Tsai,et al. Registration of Challenging Image Pairs: Initialization, Estimation, and Decision , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Jiri Matas,et al. Fixing the Locally Optimized RANSAC , 2012, BMVC.

[39] Jan-Michael Frahm,et al. From single image query to detailed 3D reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Yannis Avrithis,et al. Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Tinne Tuytelaars,et al. Location recognition over large time lags , 2014, Comput. Vis. Image Underst..

[42] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Andrew Zisserman,et al. Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44] Jan-Michael Frahm,et al. From Dusk Till Dawn: Modeling in the Dark , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[46] Michael Isard,et al. Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[47] Vincent Lepetit,et al. TILDE: A Temporally Invariant Learned DEtector , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Charles V. Stewart,et al. Keypoint Descriptors for Matching Across Multiple Image Modalities and Non-linear Intensity Variations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Rahul Sukthankar,et al. MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[51] Jiri Matas,et al. Learning Vocabularies over a Fine Quantization , 2013, International Journal of Computer Vision.

[52] Cordelia Schmid,et al. Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[53] Gary R. Bradski,et al. ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[54] Ondrej Chum,et al. CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[55] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[56] Adam Baumberg,et al. Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[57] Cordelia Schmid,et al. Accurate Image Search Using the Contextual Dissimilarity Measure , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58] Torsten Sattler,et al. Comparative Evaluation of Hand-Crafted and Learned Local Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Andrea Vedaldi,et al. Learning Covariant Feature Detectors , 2016, ECCV Workshops.

[60] Jiri Matas,et al. Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[61] Matthew A. Brown,et al. Automatic Panoramic Image Stitching using Invariant Features , 2007, International Journal of Computer Vision.

[62] Jiri Matas,et al. MODS: Fast and robust method for two-view matching , 2015, Comput. Vis. Image Underst..

[63] Albert Gordo,et al. End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[64] C. Lawrence Zitnick,et al. Edge foci interest points , 2011, 2011 International Conference on Computer Vision.