DisLocation: Scalable Descriptor Distinctiveness for Location Recognition

The objective of this paper is to improve large scale visual object retrieval for visual place recognition. Geo-localization based on a visual query is made difficult by plenty of non-distinctive features which commonly occur in imagery of urban environments, such as generic modern windows, doors, cars, trees, etc. The focus of this work is to adapt standard Hamming Embedding retrieval system to account for varying descriptor distinctiveness. To this end, we propose a novel method for efficiently estimating distinctiveness of all database descriptors, based on estimating local descriptor density everywhere in the descriptor space. In contrast to all competing methods, the (unsupervised) training time for our method (DisLoc) is linear in the number database descriptors and takes only a 100 s on a single CPU core for a 1 million image database. Furthermore, the added memory requirements are negligible (1 %).

[1]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[2]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[6]  Bernd Girod,et al.  Streaming mobile augmented reality on mobile phones , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[7]  Mor Naaman,et al.  Generating diverse and representative image search results for landmarks , 2008, WWW.

[8]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[9]  Luc Van Gool,et al.  Query Adaptive Similarity for Large Scale Object Retrieval , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[11]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[12]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Ximena Olivares,et al.  Visual diversification of image search results , 2009, WWW '09.

[15]  Andrew Zisserman,et al.  Object Mining Using a Matching Graph on Very Large Image Collections , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[16]  C. Schmid,et al.  Exploiting descriptor distances for precise image search , 2011 .

[17]  Noah Snavely,et al.  Graph-Based Discriminative Learning for Location Recognition , 2013, International Journal of Computer Vision.

[18]  Pietro Perona,et al.  CompactKdt: Compact signatures for accurate large scale object recognition , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[19]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[20]  Masatoshi Okutomi,et al.  Visual Place Recognition with Repetitive Structures , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Cordelia Schmid,et al.  A contextual dissimilarity measure for accurate and efficient image search , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[25]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[26]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[29]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[31]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Laurent Amsaleg,et al.  Image retrieval with reciprocal and shared nearest neighbors , 2014, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[33]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[34]  Yannis Avrithis,et al.  To Aggregate or Not to aggregate: Selective Match Kernels for Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Torsten Sattler,et al.  Image Retrieval for Image-Based Localization Revisited , 2012, BMVC.

[36]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[37]  Hervé Jégou,et al.  Visual query expansion with or without geometry: Refining local descriptors by feature aggregation , 2014, Pattern Recognit..

[38]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[39]  Tomás Pajdla,et al.  Learning and Calibrating Per-Location Classifiers for Visual Place Recognition , 2013, International Journal of Computer Vision.

[40]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[41]  Luc Van Gool,et al.  Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors , 2011, CVPR 2011.