Attention-Based Query Expansion Learning

Query expansion is a technique widely used in image search consisting in combining highly ranked images from an original query into an expanded query that is then reissued, generally leading to increased recall and precision. An important aspect of query expansion is choosing an appropriate way to combine the images into a new query. Interestingly, despite the undeniable empirical success of query expansion, ad-hoc methods with different caveats have dominated the landscape, and not a lot of research has been done on learning how to do query expansion. In this paper we propose a more principled framework to query expansion, where one trains, in a discriminative manner, a model that learns how images should be aggregated to form the expanded query. Within this framework, we propose a model that leverages a self-attention mechanism to effectively learn how to transfer information between the different images before aggregating them. Our approach obtains higher accuracy than existing approaches on standard benchmarks. More importantly, our approach is the only one that consistently shows high accuracy under different regimes, overcoming caveats of existing methods.

[1]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yinzheng Gu,et al.  Attention-aware Generalized Mean Pooling for Image Retrieval , 2018, ArXiv.

[3]  Yannis Avrithis,et al.  Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[5]  Luc Van Gool,et al.  Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors , 2011, CVPR 2011.

[6]  Krystian Mikolajczyk,et al.  SOLAR: Second-Order Loss and Attention for Image Retrieval , 2020, ECCV.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Bastian Leibe,et al.  Discovering favorite views of popular places with iconoid shift , 2011, 2011 International Conference on Computer Vision.

[11]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Jan-Michael Frahm,et al.  Reconstructing the world* in six days , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[16]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[17]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Kurt Keutzer,et al.  Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2020, AAAI.

[19]  Yannis Avrithis,et al.  Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Marinos Ioannides,et al.  In the wild image retrieval and clustering for 3D cultural heritage landmarks reconstruction , 2014, Multimedia Tools and Applications.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Torsten Sattler,et al.  Image Retrieval for Image-Based Localization Revisited , 2012, BMVC.

[24]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Hervé Jégou,et al.  Visual query expansion with or without geometry: Refining local descriptors by feature aggregation , 2014, Pattern Recognit..

[26]  Maksims Volkovs,et al.  Explore-Exploit Graph Traversal for Image Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[28]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[29]  Hongwei Zhao,et al.  Image Retrieval Based on Learning to Rank and Multiple Loss , 2019, ISPRS Int. J. Geo Inf..

[30]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[31]  Miroslaw Bober,et al.  REMAP: Multi-Layer Entropy-Guided Pooling of Dense CNN Features for Image Retrieval , 2019, IEEE Transactions on Image Processing.

[32]  Ying Wu,et al.  Spatially-Constrained Similarity Measurefor Large-Scale Object Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Deva Ramanan,et al.  Attentional Pooling for Action Recognition , 2017, NIPS.

[34]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[35]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[36]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[37]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Laurent Amsaleg,et al.  Image retrieval with reciprocal and shared nearest neighbors , 2014, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[39]  Akshay Deepak,et al.  Query Expansion Techniques for Information Retrieval: a Survey , 2017, Inf. Process. Manag..

[40]  Yannis Avrithis,et al.  Image Search with Selective Match Kernels: Aggregation Across Single and Multiple Images , 2016, International Journal of Computer Vision.

[41]  Jiri Matas,et al.  Image Retrieval for Online Browsing in Large Image Collections , 2013, SISAP.

[42]  Giorgos Tolias,et al.  Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Miroslaw Bober,et al.  ACTNET: End-to-End Learning of Feature Activations and Multi-stream Aggregation for Effective Instance Image Retrieval , 2019, International Journal of Computer Vision.

[44]  Yannis Avrithis,et al.  VIRaL: Visual Image Retrieval and Localization , 2010, Multimedia Tools and Applications.

[45]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[46]  Jan-Michael Frahm,et al.  Reconstructing the World* in Six Days *(As Captured by the Yahoo 100 Million Image Dataset) , 2015, CVPR 2015.

[47]  Stefano Alletto,et al.  Exploring Architectural Details Through a Wearable Egocentric Vision Device , 2016, Sensors.

[48]  Jon Almazán,et al.  Learning With Average Precision: Training Image Retrieval With a Listwise Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Maksims Volkovs,et al.  Guided Similarity Separation for Image Retrieval , 2019, NeurIPS.

[50]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[51]  Jaewoo Kang,et al.  Self-Attention Graph Pooling , 2019, ICML.