论文信息 - Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval

Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval

Optimising a ranking-based metric, such as Average Precision (AP), is notoriously challenging due to the fact that it is non-differentiable, and hence cannot be optimised directly using gradient-descent methods. To this end, we introduce an objective that optimises instead a smoothed approximation of AP, coined Smooth-AP. Smooth-AP is a plug-and-play objective function that allows for end-to-end training of deep networks with a simple and elegant implementation. We also present an analysis for why directly optimising the ranking based metric of AP offers benefits over other deep metric learning losses. We apply Smooth-AP to standard retrieval benchmarks: Stanford Online products and VehicleID, and also evaluate on larger-scale datasets: INaturalist for fine-grained category retrieval, and VGGFace2 and IJB-C for face retrieval. In all cases, we improve the performance over the state-of-the-art, especially for larger-scale datasets, thus demonstrating the effectiveness and scalability of Smooth-AP to real-world scenarios.

Andrew Zisserman | Andrew Brown | Weidi Xie | Vicky Kalogeiton

[1] P. Pérez,et al. SoDeep: A Sorting Deep Net to Learn Ranking Loss Surrogates , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Bohyung Han,et al. Stochastic Class-Based Hard Example Mining for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Kun He,et al. Hashing as Tie-Aware Learning to Rank , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Xudong Lin,et al. Deep Adversarial Metric Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5] Jungmin Lee,et al. Attention-based Ensemble for Deep Metric Learning , 2018, ECCV.

[6] Chao Zhang,et al. Hard-Aware Deeply Cascaded Embedding , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[7] Filip Radlinski,et al. A support vector method for optimizing average precision , 2007, SIGIR.

[8] O. Chapelle. Large margin optimization of ranking measures , 2007 .

[9] Quoc V. Le,et al. Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[10] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12] Robert Pless,et al. Deep Randomized Ensembles for Metric Learning , 2018, ECCV.

[13] Yang Song,et al. Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14] C. V. Jawahar,et al. Efficient Optimization for Rank-Based Loss Functions , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[16] Weilin Huang,et al. Cross-Batch Memory for Embedding Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Jiri Matas,et al. Total recall II: Query expansion revisited , 2011, CVPR 2011.

[18] Tao Qin,et al. A general approximation framework for direct optimization of information retrieval measures , 2010, Information Retrieval.

[19] Yan Lu,et al. Local Descriptors Optimized for Average Precision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20] Gert R. G. Lanckriet,et al. Metric Learning to Rank , 2010, ICML.

[21] Yang Hua,et al. Ranked List Loss for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24] Horst Possegger,et al. BIER — Boosting Independent Embeddings Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25] Qi Qian,et al. SoftTriple Loss: Deep Metric Learning Without Triplet Sampling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26] Jing Lu,et al. Sampling Wisely: Deep Image Embedding by Top-K Precision Optimization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Georg Martius,et al. Differentiation of Blackbox Combinatorial Solvers , 2020, ICLR.

[29] Gustavo Carneiro,et al. Smart Mining for Deep Metric Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30] Cordelia Schmid,et al. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[31] Anil K. Jain,et al. IARPA Janus Benchmark - C: Face Dataset and Protocol , 2018, 2018 International Conference on Biometrics (ICB).

[32] Omkar M. Parkhi,et al. VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[33] Alexander J. Smola,et al. Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34] Victor S. Lempitsky,et al. Learning Deep Embeddings with Histogram Loss , 2016, NIPS.

[35] Yair Movshovitz-Attias,et al. No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[37] Andrew Zisserman,et al. All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Stephen E. Robertson,et al. SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[39] Julian Martin Eisenschlos,et al. SoftSort: A Continuous Relaxation for the argsort Operator , 2020, ICML.

[40] Jian Wang,et al. Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41] Matthew R. Scott,et al. Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Silvio Savarese,et al. Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Chin-Hui Lee,et al. A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44] Ling-Yu Duan,et al. Towards Accurate One-Stage Object Detection With AP-Loss , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Albert Gordo,et al. Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[46] Stan Sclaroff,et al. Deep Metric Learning to Rank , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Claudio Michaelis,et al. Optimizing Rank-Based Metrics With Blackbox Differentiation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Stefanie Jegelka,et al. Deep Metric Learning via Facility Location , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50] Florent Perronnin,et al. Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51] Karsten Roth,et al. MIC: Mining Interclass Characteristics for Improved Metric Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[53] Bhiksha Raj,et al. SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Tomás Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55] Tie-Yan Liu,et al. Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[56] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[57] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[59] Cordelia Schmid,et al. Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60] Raquel Urtasun,et al. Deep Spectral Clustering Learning , 2017, ICML.

[61] Weilin Huang,et al. Deep Metric Learning with Hierarchical Triplet Loss , 2018, ECCV.

[62] Lucas Beyer,et al. In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[63] Andrew Zisserman,et al. AutoCorrect: Deep Inductive Alignment of Noisy Geometric Annotations , 2019, BMVC.

[64] Chu-Song Chen,et al. Cross-batch Reference Learning for Deep Classification and Retrieval , 2016, ACM Multimedia.

[65] John Guiver,et al. Learning to rank with SoftRank and Gaussian processes , 2008, SIGIR '08.

[66] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[67] Yang Song,et al. The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68] Björn Ommer,et al. Divide and Conquer the Embedding Space for Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69] David Stutz,et al. Neural Codes for Image Retrieval , 2015 .

[70] Jiwen Lu,et al. Learning Globally Optimized Object Detector via Policy Gradient , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[72] Ondrej Chum,et al. CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[73] Jon Almazán,et al. Learning With Average Precision: Training Image Retrieval With a Listwise Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[74] Vittorio Ferrari,et al. End-to-End Training of Object Class Detectors for Mean Average Precision , 2016, ACCV.