论文信息 - Learning With Average Precision: Training Image Retrieval With a Listwise Loss

Learning With Average Precision: Training Image Retrieval With a Listwise Loss

Image retrieval can be formulated as a ranking problem where the goal is to order database images by decreasing similarity to the query. Recent deep models for image retrieval have outperformed traditional methods by leveraging ranking-tailored loss functions, but important theoretical and practical problems remain. First, rather than directly optimizing the global ranking, they minimize an upper-bound on the essential loss, which does not necessarily result in an optimal mean average precision (mAP). Second, these methods require significant engineering efforts to work well, e.g., special pre-training and hard-negative mining. In this paper we propose instead to directly optimize the global mAP by leveraging recent advances in listwise loss formulations. Using a histogram binning approximation, the AP can be differentiated and thus employed to end-to-end learning. Compared to existing losses, the proposed method considers thousands of images simultaneously at each iteration and eliminates the need for ad hoc tricks. It also establishes a new state of the art on many standard retrieval benchmarks. Models and evaluation scripts have been made available at: https://europe.naverlabs.com/Deep-Image-Retrieval/.

[1] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Ben-Gurion,et al. Care to Share? Learning to Rank Personal Photos for Public Sharing , 2018 .

[3] Shengcai Liao,et al. Embedding Deep Metric for Person Re-identification: A Study Against Large Variations , 2016, ECCV.

[4] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5] Victor S. Lempitsky,et al. Learning Deep Embeddings with Histogram Loss , 2016, NIPS.

[6] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[7] Kazuaki Kishida. Property of average precision and its generalization: An examination of evaluation indicator for information retrieval experiments , 2005 .

[8] Kun He,et al. Hashing as Tie-Aware Learning to Rank , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9] Kihyuk Sohn,et al. Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[10] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.

[11] Vittorio Ferrari,et al. End-to-End Training of Object Class Detectors for Mean Average Precision , 2016, ACCV.

[12] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.

[13] Hedi Ben-younes,et al. Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label Prediction , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[14] Tamir Hazan,et al. Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[15] Filip Radlinski,et al. A support vector method for optimizing average precision , 2007, SIGIR.

[16] Yang Song,et al. Training Deep Neural Networks via Direct Loss Minimization , 2015, ICML.

[17] Josef Sivic,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Victor S. Lempitsky,et al. Neural Codes for Image Retrieval , 2014, ECCV.

[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Alexander J. Smola,et al. Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Tao Xiang,et al. Deep Transfer Learning for Person Re-Identification , 2016, 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM).

[22] Yair Movshovitz-Attias,et al. No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23] Gustavo Carneiro,et al. Smart Mining for Deep Metric Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24] Giorgos Tolias,et al. Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Jiri Matas,et al. Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[26] Svetlana Lazebnik,et al. Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[27] Kaiqi Huang,et al. Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Albert Gordo,et al. End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[29] Stan Sclaroff,et al. Deep Metric Learning to Rank , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Simon Osindero,et al. Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[31] Xiang Yu,et al. Deep Metric Learning via Lifted Structured Feature Embedding , 2016 .

[32] Kaiqi Huang,et al. A Multi-Task Deep Network for Person Re-Identification , 2016, AAAI.

[33] Song-Chun Zhu,et al. Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35] C. V. Jawahar,et al. Optimizing Average Precision Using Weakly Supervised Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36] Hervé Jégou,et al. Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[37] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[38] Victor S. Lempitsky,et al. Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39] Ernest Valveny,et al. Leveraging category-level labels for instance-level image retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40] Andrew Trotman,et al. Learning to Rank , 2005, Information Retrieval.

[41] Ronan Sicre,et al. Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[42] C. V. Jawahar,et al. Efficient Optimization for Average Precision SVM , 2014, NIPS.

[43] Manohar Paluri,et al. Metric Learning with Adaptive Density Discrimination , 2015, ICLR.

[44] Kun He,et al. MIHash: Online Hashing with Mutual Information , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45] Ondrej Chum,et al. CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[46] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[47] Tie-Yan Liu,et al. Learning to rank for information retrieval , 2009, SIGIR.

[48] Andrew Zisserman,et al. Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Bohyung Han,et al. Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[50] Albert Gordo,et al. Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[51] Chong Wang,et al. How to Train Triplet Networks with 100K Identities? , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[52] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[53] Michael Isard,et al. Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[54] Yan Lu,et al. Local Descriptors Optimized for Average Precision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55] Tie-Yan Liu,et al. Learning to Rank for Information Retrieval , 2011 .

[56] Grigorios Tsoumakas,et al. A Comprehensive Study Over VLAD and Product Quantization in Large-Scale Image Retrieval , 2014, IEEE Transactions on Multimedia.

[57] Andrew Zisserman,et al. Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58] Tie-Yan Liu,et al. Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[59] Xiaogang Wang,et al. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60] Yannis Avrithis,et al. Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61] Lucas Beyer,et al. In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.