Data-Efficient Ranking Distillation for Image Retrieval

Recent advances in deep learning has lead to rapid developments in the field of image retrieval. However, the best performing architectures incur significant computational cost. Recent approaches tackle this issue using knowledge distillation to transfer knowledge from a deeper and heavier architecture to a much smaller network. In this paper we address knowledge distillation for metric learning problems. Unlike previous approaches, our proposed method jointly addresses the following constraints i) limited queries to teacher model, ii) black box teacher model with access to the final output representation, and iii) small fraction of original training data without any ground-truth labels. In addition, the distillation method does not require the student and teacher to have same dimensionality. Addressing these constraints reduces computation requirements, dependency on large-scale training datasets and addresses practical scenarios of limited or partial access to private data such as teacher models or the corresponding training data/labels. The key idea is to augment the original training set with additional samples by performing linear interpolation in the final output representation space. Distillation is then performed in the joint space of original and augmented teacher-student sample representations. Results demonstrate that our approach can match baseline models trained with full supervision. In low training sample settings, our approach outperforms the fully supervised approach on two challenging image retrieval datasets, ROxford5k and RParis6k \cite{Roxf} with the least possible teacher supervision.

[1]  Yoshua Bengio,et al.  Interpolation Consistency Training for Semi-Supervised Learning , 2019, IJCAI.

[2]  Florent Perronnin,et al.  Fisher vectors meet Neural Networks: A hybrid classification architecture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[5]  Yan Lu,et al.  Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[7]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[8]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[9]  L. Breiman,et al.  BORN AGAIN TREES , 1996 .

[10]  Jan-Michael Frahm,et al.  From Dusk Till Dawn: Modeling in the Dark , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Naiyan Wang,et al.  Like What You Like: Knowledge Distill via Neuron Selectivity Transfer , 2017, ArXiv.

[14]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Yannis Avrithis,et al.  Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[21]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[22]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[23]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[24]  Giorgos Tolias,et al.  Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Kyunghyun Cho,et al.  Retrieval-Augmented Convolutional Neural Networks Against Adversarial Examples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kaiqi Huang,et al.  Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jan-Michael Frahm,et al.  From single image query to detailed 3D reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[29]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Dongdong Wang,et al.  Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Menglong Zhu,et al.  Detect-To-Retrieve: Efficient Regional Aggregation for Image Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  Zhaoxiang Zhang,et al.  DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer , 2017, AAAI.

[36]  Ali Farhadi,et al.  Label Refinery: Improving ImageNet Classification through Label Progression , 2018, ArXiv.

[37]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[39]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[40]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[41]  Jon Almazán,et al.  Learning With Average Precision: Training Image Retrieval With a Listwise Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Tatsuya Harada,et al.  Between-Class Learning for Image Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[44]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[45]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .