A comprehensive evaluation of local detectors and descriptors

Abstract As local detectors and descriptors can find and represent distinctive keypoints in an image, various types of keypoints detection and description methods have been proposed. Each method has particular advantages and limitations and may be appropriate in different contexts. In this paper, we evaluate the performance of a wide set of local detectors and descriptors. First, we compare diverse local detectors with regard to the repeatability, and local descriptors in terms of the recall and precision. Next, we apply the visual words model constructed from the local descriptors with real values and binary string to large scale image search. The evaluation results reveal some strengths and weaknesses of the recent binary string descriptors compared with the notable real valued descriptors. Finally, we integrate the local detectors and descriptors with the framework of fully affine space and evaluate their performance under major viewpoint transformations. The presented comparative experimental studies can support researchers in choosing an appropriate local detector and descriptor for their specific computer vision applications.

[1]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[2]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[4]  Krystian Mikolajczyk,et al.  Evaluation of local detectors and descriptors for fast feature matching , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[5]  Bart Thomee,et al.  TOP-SURF: a visual words toolkit , 2010, ACM Multimedia.

[6]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[7]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Bart Thomee,et al.  Interactive search in image retrieval: a survey , 2012, International Journal of Multimedia Information Retrieval.

[9]  Rong Xiao,et al.  Rank-SIFT: Learning to rank repeatable local interest points , 2011, CVPR 2011.

[10]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Chih-Fong Tsai,et al.  Keypoint selection for efficient bag-of-words feature generation and effective image classification , 2016, Inf. Sci..

[13]  Darius Burschka,et al.  Adaptive and Generic Corner Detection Based on the Accelerated Segment Test , 2010, ECCV.

[14]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[15]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Stefano Soatto,et al.  Domain-size pooling in local descriptors: DSP-SIFT , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Luc Van Gool,et al.  Content-Based Image Retrieval Based on Local Affinely Invariant Regions , 1999, VISUAL.

[18]  Thomas Brox,et al.  Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT , 2014, ArXiv.

[19]  Tal Hassner,et al.  LATCH: Learned arrangements of three patch codes , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[20]  Bruce A. Draper,et al.  Are you using the right approximate nearest neighbor algorithm? , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[21]  Paul Beaudet,et al.  Rotationally invariant image operators , 1978 .

[22]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[23]  Kurt Konolige,et al.  CenSurE: Center Surround Extremas for Realtime Feature Detection and Matching , 2008, ECCV.

[24]  Tong Heng Lee,et al.  Shape classification using invariant features and contextual information in the bag-of-words model , 2015, Pattern Recognit..

[25]  Adrien Bartoli,et al.  KAZE Features , 2012, ECCV.

[26]  Q. M. Jonathan Wu,et al.  A comparative experimental study of image feature detectors and descriptors , 2015, Machine Vision and Applications.

[27]  Mubarak Shah,et al.  High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.

[28]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[30]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[32]  Rita Cucchiara,et al.  A fast approach for integrating ORB descriptors in the bag of words model , 2013, Electronic Imaging.

[33]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[34]  Michael S. Lew,et al.  Evaluation of salient point methods , 2013, ACM Multimedia.

[35]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[37]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[38]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Vincent Lepetit,et al.  Boosting Binary Keypoint Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[41]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[43]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[44]  Michael S. Lew,et al.  RIFF: Retina-inspired Invariant Fast Feature Descriptor , 2014, ACM Multimedia.

[45]  Pietro Perona,et al.  Evaluation of Features Detectors and Descriptors based on 3D Objects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[46]  Ling Shao,et al.  Recent advances and trends in visual tracking: A review , 2011, Neurocomputing.

[47]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[48]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[50]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[51]  Paul L. Rosin Measuring Corner Properties , 1999, Comput. Vis. Image Underst..

[52]  Mohammed Bennamoun,et al.  A Comprehensive Performance Evaluation of 3D Local Feature Descriptors , 2015, International Journal of Computer Vision.

[53]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[54]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[55]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[56]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[57]  Michael S. Lew,et al.  Salient features for visual word based image copy detection , 2014, ICMR.

[58]  Jan-Michael Frahm,et al.  Comparative Evaluation of Binary Features , 2012, ECCV.

[59]  Hans P. Morevec Towards automatic visual obstacle avoidance , 1977, IJCAI 1977.

[60]  Jean-Michel Morel,et al.  A fully affine invariant image comparison method , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[61]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[62]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[63]  Dario Maio,et al.  Saliency-based keypoint selection for fast object detection and matching , 2015, Pattern Recognit. Lett..

[64]  Tobias Höllerer,et al.  Evaluation of Interest Point Detectors and Feature Descriptors for Visual Tracking , 2011, International Journal of Computer Vision.

[65]  Tomasz Kornuta,et al.  Performance Evaluation of Binary Descriptors of Local Features , 2014, ICCVG.

[66]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[67]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.