Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval

Optimising a ranking-based metric, such as Average Precision (AP), is notoriously challenging due to the fact that it is non-differentiable, and hence cannot be optimised directly using gradient-descent methods. To this end, we introduce an objective that optimises instead a smoothed approximation of AP, coined Smooth-AP. Smooth-AP is a plug-and-play objective function that allows for end-to-end training of deep networks with a simple and elegant implementation. We also present an analysis for why directly optimising the ranking based metric of AP offers benefits over other deep metric learning losses. We apply Smooth-AP to standard retrieval benchmarks: Stanford Online products and VehicleID, and also evaluate on larger-scale datasets: INaturalist for fine-grained category retrieval, and VGGFace2 and IJB-C for face retrieval. In all cases, we improve the performance over the state-of-the-art, especially for larger-scale datasets, thus demonstrating the effectiveness and scalability of Smooth-AP to real-world scenarios.

[1]  P. Pérez,et al.  SoDeep: A Sorting Deep Net to Learn Ranking Loss Surrogates , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Bohyung Han,et al.  Stochastic Class-Based Hard Example Mining for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kun He,et al.  Hashing as Tie-Aware Learning to Rank , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Xudong Lin,et al.  Deep Adversarial Metric Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Jungmin Lee,et al.  Attention-based Ensemble for Deep Metric Learning , 2018, ECCV.

[6]  Chao Zhang,et al.  Hard-Aware Deeply Cascaded Embedding , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[8]  O. Chapelle Large margin optimization of ranking measures , 2007 .

[9]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[10]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Robert Pless,et al.  Deep Randomized Ensembles for Metric Learning , 2018, ECCV.

[13]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  C. V. Jawahar,et al.  Efficient Optimization for Rank-Based Loss Functions , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[16]  Weilin Huang,et al.  Cross-Batch Memory for Embedding Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[18]  Tao Qin,et al.  A general approximation framework for direct optimization of information retrieval measures , 2010, Information Retrieval.

[19]  Yan Lu,et al.  Local Descriptors Optimized for Average Precision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[21]  Yang Hua,et al.  Ranked List Loss for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  Horst Possegger,et al.  BIER — Boosting Independent Embeddings Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Qi Qian,et al.  SoftTriple Loss: Deep Metric Learning Without Triplet Sampling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Jing Lu,et al.  Sampling Wisely: Deep Image Embedding by Top-K Precision Optimization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Georg Martius,et al.  Differentiation of Blackbox Combinatorial Solvers , 2020, ICLR.

[29]  Gustavo Carneiro,et al.  Smart Mining for Deep Metric Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[31]  Anil K. Jain,et al.  IARPA Janus Benchmark - C: Face Dataset and Protocol , 2018, 2018 International Conference on Biometrics (ICB).

[32]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[33]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Victor S. Lempitsky,et al.  Learning Deep Embeddings with Histogram Loss , 2016, NIPS.

[35]  Yair Movshovitz-Attias,et al.  No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[37]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[39]  Julian Martin Eisenschlos,et al.  SoftSort: A Continuous Relaxation for the argsort Operator , 2020, ICML.

[40]  Jian Wang,et al.  Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Matthew R. Scott,et al.  Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Chin-Hui Lee,et al.  A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  Ling-Yu Duan,et al.  Towards Accurate One-Stage Object Detection With AP-Loss , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[46]  Stan Sclaroff,et al.  Deep Metric Learning to Rank , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Claudio Michaelis,et al.  Optimizing Rank-Based Metrics With Blackbox Differentiation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Stefanie Jegelka,et al.  Deep Metric Learning via Facility Location , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Karsten Roth,et al.  MIC: Mining Interclass Characteristics for Improved Metric Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[53]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[56]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[57]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[59]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Raquel Urtasun,et al.  Deep Spectral Clustering Learning , 2017, ICML.

[61]  Weilin Huang,et al.  Deep Metric Learning with Hierarchical Triplet Loss , 2018, ECCV.

[62]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[63]  Andrew Zisserman,et al.  AutoCorrect: Deep Inductive Alignment of Noisy Geometric Annotations , 2019, BMVC.

[64]  Chu-Song Chen,et al.  Cross-batch Reference Learning for Deep Classification and Retrieval , 2016, ACM Multimedia.

[65]  John Guiver,et al.  Learning to rank with SoftRank and Gaussian processes , 2008, SIGIR '08.

[66]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[67]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68]  Björn Ommer,et al.  Divide and Conquer the Embedding Space for Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[70]  Jiwen Lu,et al.  Learning Globally Optimized Object Detector via Policy Gradient , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[73]  Jon Almazán,et al.  Learning With Average Precision: Training Image Retrieval With a Listwise Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[74]  Vittorio Ferrari,et al.  End-to-End Training of Object Class Detectors for Mean Average Precision , 2016, ACCV.