Attention-based Ensemble for Deep Metric Learning

Deep metric learning aims to learn an embedding function, modeled as deep neural network. This embedding function usually puts semantically similar images close while dissimilar images far from each other in the learned embedding space. Recently, ensemble has been applied to deep metric learning to yield state-of-the-art results. As one important aspect of ensemble, the learners should be diverse in their feature embeddings. To this end, we propose an attention-based ensemble, which uses multiple attention masks, so that each learner can attend to different parts of the object. We also propose a divergence loss, which encourages diversity among the learners. The proposed method is applied to the standard benchmarks of deep metric learning and experimental results show that it outperforms the state-of-the-art methods by a significant margin on image retrieval tasks.

[1]  Xiao Liu,et al.  Fully Convolutional Attention Localization Networks: Efficient Attention Localization for Fine-Grained Recognition , 2016, ArXiv.

[2]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[3]  Chen Huang,et al.  Local Similarity-Aware Deep Feature Embedding , 2016, NIPS.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Gustavo Carneiro,et al.  Smart Mining for Deep Metric Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Kavita Bala,et al.  Learning visual similarity for product design with convolutional neural networks , 2015, ACM Trans. Graph..

[7]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[8]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[9]  Koray Kavukcuoglu,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[10]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[11]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[12]  Jian Wang,et al.  Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[15]  Xiang Yu,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2016 .

[16]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[17]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[18]  Raquel Urtasun,et al.  Deep Spectral Clustering Learning , 2017, ICML.

[19]  Philip Bachman,et al.  Learning with Pseudo-Ensembles , 2014, NIPS.

[20]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[21]  Stefanie Jegelka,et al.  Deep Metric Learning via Facility Location , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Chao Zhang,et al.  Hard-Aware Deeply Cascaded Embedding , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Horst Possegger,et al.  Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Pierre Sermanet,et al.  Attention for Fine-Grained Categorization , 2014, ICLR.

[25]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[29]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Horst Possegger,et al.  BIER — Boosting Independent Embeddings Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Bo Zhao,et al.  Diversified Visual Attention Networks for Fine-Grained Object Classification , 2016, IEEE Transactions on Multimedia.

[32]  Serge J. Belongie,et al.  Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.

[33]  Michael Cogswell,et al.  Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks , 2015, ArXiv.

[34]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[36]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Yair Movshovitz-Attias,et al.  No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Victor S. Lempitsky,et al.  Learning Deep Embeddings with Histogram Loss , 2016, NIPS.

[39]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[40]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[41]  Horst Possegger,et al.  Efficient Model Averaging for Deep Neural Networks , 2016, ACCV.