Understanding and Improving Kernel Local Descriptors

We propose a multiple-kernel local-patch descriptor based on efficient match kernels from pixel gradients. It combines two parametrizations of gradient position and direction, each parametrization provides robustness to a different type of patch mis-registration: polar parametrization for noise in the patch dominant orientation detection, Cartesian for imprecise location of the feature point. Combined with whitening of the descriptor space, that is learned with or without supervision, the performance is significantly improved. We analyze the effect of the whitening on patch similarity and demonstrate its semantic meaning. Our unsupervised variant is the best performing descriptor constructed without the need of labeled data. Despite the simplicity of the proposed descriptor, it competes well with deep learning approaches on a number of different tasks.

[1]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[2]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[3]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jan-Michael Frahm,et al.  Reconstructing the World* in Six Days *(As Captured by the Yahoo 100 Million Image Dataset) , 2015, CVPR 2015.

[6]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Hongbin Zha,et al.  Supervised Kernel Descriptors for Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[9]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[10]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Ondrej Chum,et al.  Multiple-Kernel Local-Patch Descriptor , 2017, BMVC.

[12]  Torsten Sattler,et al.  Comparative Evaluation of Hand-Crafted and Learned Local Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[14]  Marko Heikkilä,et al.  Description of interest regions with local binary patterns , 2009, Pattern Recognit..

[15]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[16]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jiri Matas,et al.  Improving Descriptors for Fast Tree Matching by Optimal Linear Projection , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Hervé Jégou,et al.  Kernel Local Descriptors with Implicit Rotation Matching , 2015, ICMR.

[19]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[21]  Cordelia Schmid,et al.  Convolutional Kernel Networks , 2014, NIPS.

[22]  Vincent Lepetit,et al.  Learning Image Descriptors with the Boosting-Trick , 2012, NIPS.

[23]  Matti Pietikäinen,et al.  Rotation Invariant Image Description with Local Binary Pattern Histogram Fourier Features , 2009, SCIA.

[24]  David G. Lowe,et al.  Shape Descriptors for Maximally Stable Extremal Regions , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  Richard Szeliski,et al.  Multi-image matching using multi-scale oriented patches , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Stefano Soatto,et al.  Domain-size pooling in local descriptors: DSP-SIFT , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Lei Zhou,et al.  Progressive Large Scale-Invariant Image Matching in Scale Space , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Cordelia Schmid,et al.  A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[31]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[32]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[33]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[34]  Cristian Sminchisescu,et al.  Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[35]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[36]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[37]  Krystian Mikolajczyk,et al.  PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors , 2016, ArXiv.

[38]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Jan-Michael Frahm,et al.  Building Rome on a Cloudless Day , 2010, ECCV.

[40]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[41]  Jean-Michel Morel,et al.  A fully affine invariant image comparison method , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  Yuichi Yoshida,et al.  CARD: Compact And Real-time Descriptors , 2011, 2011 International Conference on Computer Vision.

[43]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[44]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Iasonas Kokkinos,et al.  Scale invariance without scale selection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Andrea Vedaldi,et al.  Visualizing Deep Convolutional Neural Networks Using Natural Pre-images , 2015, International Journal of Computer Vision.

[47]  Olivier Ledoit,et al.  Honey, I Shrunk the Sample Covariance Matrix , 2003 .

[48]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[50]  Cordelia Schmid,et al.  Convolutional Patch Representations for Image Retrieval: An Unsupervised Approach , 2016, International Journal of Computer Vision.

[51]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Jiri Matas,et al.  WxBS: Wide Baseline Stereo Generalizations , 2015, BMVC.

[53]  Patrick Pérez,et al.  Revisiting the VLAD image representation , 2013, ACM Multimedia.

[54]  Binoy Pinto,et al.  Speeded Up Robust Features , 2011 .

[55]  Cordelia Schmid,et al.  Local Convolutional Features with Unsupervised Training for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[56]  Victor S. Lempitsky,et al.  Aggregating Deep Convolutional Features for Image Retrieval , 2015, ArXiv.

[57]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[60]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Hervé Jégou,et al.  Rotation and translation covariant match kernels for image retrieval , 2015, Comput. Vis. Image Underst..

[63]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[65]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[66]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Jan-Michael Frahm,et al.  From single image query to detailed 3D reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Ondrej Chum Low Dimensional Explicit Feature Maps , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[71]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[72]  Masatoshi Okutomi,et al.  Robust feature matching by learning descriptor covariance with viewpoint synthesis , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).