$\mathbb {H}$H-Patches: A Benchmark and Evaluation of Handcrafted and Learned Loca

In this paper, a novel benchmark is introduced for evaluating local image descriptors. We demonstrate limitations of the commonly used datasets and evaluation protocols, that lead to ambiguities and contradictory results in the literature. Furthermore, these benchmarks are nearly saturated due to the recent improvements in local descriptors obtained by learning from large annotated datasets. To address these issues, we introduce a new large dataset suitable for training and testing modern descriptors, together with strictly defined evaluation protocols in several tasks such as matching, retrieval and verification. This allows for more realistic, thus more reliable comparisons in different application scenarios. We evaluate the performance of several state-of-the-art descriptors and analyse their properties. We show that a simple normalisation of traditional hand-crafted descriptors is able to boost their performance to the level of deep learning based descriptors once realistic benchmarks are considered. Additionally we specify a protocol for learning and evaluating using cross validation. We show that when training state-of-the-art descriptors on this dataset, the traditional verification task is almost entirely saturated.

[1]  Krystian Mikolajczyk,et al.  Evaluation of local detectors and descriptors for fast feature matching , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[2]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[3]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[4]  Vincent Lepetit,et al.  TILDE: A Temporally Invariant Learned DEtector , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Konrad Schindler,et al.  Predicting Matchability , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Pascal Fua,et al.  Training for Task Specific Keypoint Detection , 2009, DAGM-Symposium.

[7]  R. Kouskouridas,et al.  Improving the robustness in feature detection by local contrast enhancement , 2012, 2012 IEEE International Conference on Imaging Systems and Techniques Proceedings.

[8]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jiri Matas,et al.  Improving Descriptors for Fast Tree Matching by Optimal Linear Projection , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Hervé Jégou,et al.  Kernel Local Descriptors with Implicit Rotation Matching , 2015, ICMR.

[11]  J. P. Lewis,et al.  Fast Template Matching , 2009 .

[12]  Tal Hassner,et al.  LATCH: Learned arrangements of three patch codes , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Adrien Bartoli,et al.  KAZE Features , 2012, ECCV.

[15]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[16]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Bodo Rosenhahn,et al.  High-Resolution Feature Evaluation Benchmark , 2013, CAIP.

[18]  Luc Van Gool,et al.  Affine/ Photometric Invariants for Planar Intensity Patterns , 1996, ECCV.

[19]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[20]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[23]  Ondrej Chum,et al.  Multiple-Kernel Local-Patch Descriptor , 2017, BMVC.

[24]  Edwin Olson,et al.  Learning convolutional filters for interest point detection , 2013, 2013 IEEE International Conference on Robotics and Automation.

[25]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[26]  Cordelia Schmid,et al.  Convolutional Patch Representations for Image Retrieval: An Unsupervised Approach , 2016, International Journal of Computer Vision.

[27]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Jiri Matas,et al.  Learning Discriminative Affine Regions via Discriminability , 2017, ArXiv.

[29]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Bernd Girod,et al.  Feature Matching Performance of Compact Descriptors for Visual Search , 2014, 2014 Data Compression Conference.

[31]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[33]  Robert Pless,et al.  Consistent Temporal Variations in Many Outdoor Scenes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Zhanyi Hu,et al.  Aggregating gradient distributions into intensity orders: A novel local image descriptor , 2011, CVPR 2011.

[36]  Lu Tian,et al.  OSRI: A Rotationally Invariant Binary Descriptor , 2014, IEEE Transactions on Image Processing.

[37]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[38]  Igor S. Pandzic,et al.  Learning Local Descriptors by Optimizing the Keypoint-Correspondence Criterion : Applications to Face Matching , Learning from Unlabeled Videos and 3 D-Shape Retrieval , 2016 .

[39]  Vincent Lepetit,et al.  Learning Image Descriptors with Boosting , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Richard Szeliski,et al.  Multi-image matching using multi-scale oriented patches , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41]  Stefano Soatto,et al.  Domain-size pooling in local descriptors: DSP-SIFT , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jiri Matas,et al.  WxBS: Wide Baseline Stereo Generalizations , 2015, BMVC.

[43]  Jiri Matas,et al.  Learning Fast Emulators of Binary Decision Processes , 2009, International Journal of Computer Vision.

[44]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[45]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[46]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Vincent Lepetit,et al.  Learning to Assign Orientations to Feature Points , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Gang Hua,et al.  Discriminant Embedding for Local Image Descriptors , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[50]  Cordelia Schmid,et al.  Local Convolutional Features with Unsupervised Training for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Shiliang Zhang,et al.  USB: Ultrashort Binary Descriptor for Fast Visual Matching and Retrieval , 2014, IEEE Transactions on Image Processing.

[52]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Tom Drummond,et al.  Binary Histogrammed Intensity Patches for Efficient and Robust Matching , 2011, International Journal of Computer Vision.

[54]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[55]  Leonardo Trujillo,et al.  Using Evolution to Learn How to Perform Interest Point Detection , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[56]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[57]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[58]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Serge J. Belongie,et al.  Learning to Detect and Match Keypoints with Deep Architectures , 2016, BMVC.

[60]  C. Lawrence Zitnick,et al.  Edge foci interest points , 2011, 2011 International Conference on Computer Vision.

[61]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[62]  Jean-Michel Morel,et al.  ASIFT: An Algorithm for Fully Affine Invariant Comparison , 2011, Image Process. Line.

[63]  Matti Pietikäinen,et al.  Performance evaluation of texture measures with classification based on Kullback discrimination of distributions , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[64]  Andrea Vedaldi,et al.  Learning Covariant Feature Detectors , 2016, ECCV Workshops.

[65]  David J. Kriegman,et al.  Locally Uniform Comparison Image Descriptor , 2012, NIPS.

[66]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[68]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[69]  Jiwen Lu,et al.  Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[71]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[72]  Luigi di Stefano,et al.  Keypoints from symmetries by wave propagation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[73]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  Shih-Fu Chang,et al.  Learning Discriminative and Transformation Covariant Local Feature Detectors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Henrik Aanæs,et al.  Interesting Interest Points , 2011, International Journal of Computer Vision.

[76]  Vassileios Balntas,et al.  Efficient learning of local image descriptors , 2016 .

[77]  Rudy Lauwereins,et al.  SIFER: Scale-Invariant Feature Detector with Error Resilience , 2013, International Journal of Computer Vision.

[78]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[80]  Darius Burschka,et al.  Adaptive and Generic Corner Detection Based on the Accelerated Segment Test , 2010, ECCV.

[81]  Yung-Yu Chuang,et al.  Accumulated Stability Voting: A Robust Descriptor from Descriptors of Multiple Scales , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[83]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[85]  Jiri Matas,et al.  Repeatability Is Not Enough: Learning Affine Regions via Discriminability , 2017, ECCV.

[86]  Matthew A. Brown,et al.  Picking the best DAISY , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[87]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[88]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[89]  Hongping Cai,et al.  Learning Linear Discriminant Projections for Dimensionality Reduction of Image Descriptors , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[90]  Horst Bischof,et al.  Learned local descriptors for recognition and matching , 2008 .

[91]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[92]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[93]  Thomas Brox,et al.  Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT , 2014, ArXiv.

[94]  Rudy Lauwereins,et al.  Derivative-Based Scale Invariant Image Feature Detector With Error Resilience , 2014, IEEE Transactions on Image Processing.

[95]  Bodo Rosenhahn,et al.  Increasing the accuracy of feature evaluation benchmarks using differential evolution , 2011, 2011 IEEE Symposium on Differential Evolution (SDE).

[96]  Bin Fan,et al.  Local Intensity Order Pattern for feature description , 2011, 2011 International Conference on Computer Vision.

[97]  Krystian Mikolajczyk,et al.  BOLD - Binary online learned descriptor for efficient image matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98]  Christian Osendorfer,et al.  Convolutional Neural Networks Learn Compact Local Image Descriptors , 2013, ICONIP.

[99]  Vincent Lepetit,et al.  Efficient Discriminative Projections for Compact Binary Descriptors , 2012, ECCV.