Directional Statistics-based Deep Metric Learning for Image Classification and Retrieval

Deep distance metric learning (DDML), which is proposed to learn image similarity metrics in an end-to-end manner based on the convolution neural network, has achieved encouraging results in many computer vision tasks.$L2$-normalization in the embedding space has been used to improve the performance of several DDML methods. However, the commonly used Euclidean distance is no longer an accurate metric for $L2$-normalized embedding space, i.e., a hyper-sphere. Another challenge of current DDML methods is that their loss functions are usually based on rigid data formats, such as the triplet tuple. Thus, an extra process is needed to prepare data in specific formats. In addition, their losses are obtained from a limited number of samples, which leads to a lack of the global view of the embedding space. In this paper, we replace the Euclidean distance with the cosine similarity to better utilize the $L2$-normalization, which is able to attenuate the curse of dimensionality. More specifically, a novel loss function based on the von Mises-Fisher distribution is proposed to learn a compact hyper-spherical embedding space. Moreover, a new efficient learning algorithm is developed to better capture the global structure of the embedding space. Experiments for both classification and retrieval tasks on several standard datasets show that our method achieves state-of-the-art performance with a simpler training procedure. Furthermore, we demonstrate that, even with a small number of convolutional layers, our model can still obtain significantly better classification performance than the widely used softmax loss.

[1]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[2]  Liming Chen,et al.  von Mises-Fisher Mixture Model-based Deep learning: Application to Face Verification , 2017, ArXiv.

[3]  Rong Jin,et al.  Fine-grained visual categorization via multi-stage metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[6]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[7]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[8]  Jin Wang,et al.  Equidistance constrained metric learning for person re-identification , 2018, Pattern Recognit..

[9]  Ian D. Reid,et al.  Fast Training of Triplet-Based Deep Binary Embedding Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Sander Dieleman,et al.  Rotation-invariant convolutional neural networks for galaxy morphology prediction , 2015, ArXiv.

[12]  Manohar Paluri,et al.  Metric Learning with Adaptive Density Discrimination , 2015, ICLR.

[13]  Polina Golland,et al.  Discovering structure in the space of fMRI selectivity profiles , 2010, NeuroImage.

[14]  Suvrit Sra,et al.  Directional Statistics in Machine Learning: a Brief Review , 2016, 1605.00316.

[15]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[17]  Li Bai,et al.  Cosine Similarity Metric Learning for Face Verification , 2010, ACCV.

[18]  Qing Wang,et al.  Distance metric optimization driven convolutional neural network for age invariant face recognition , 2018, Pattern Recognit..

[19]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[20]  Yue Gao,et al.  Multi-View 3D Object Retrieval With Deep Embedding Network , 2016, IEEE Transactions on Image Processing.

[21]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[23]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[24]  Tianbao Yang,et al.  Hyper-class augmented and regularized deep learning for fine-grained image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  David W. Wells The Penguin Dictionary of Curious and Interesting Numbers , 1986 .

[28]  Jiwen Lu,et al.  Discriminative Deep Metric Learning for Face and Kinship Verification , 2017, IEEE Transactions on Image Processing.

[29]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[30]  John W. Fisher,et al.  A Dirichlet Process Mixture Model for Spherical Data , 2015, AISTATS.

[31]  Yu Liu,et al.  Rethinking Feature Discrimination and Polymerization for Large-scale Recognition , 2017, ArXiv.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jianru Xue,et al.  Building discriminative CNN image representations for object retrieval using the replicator equation , 2018, Pattern Recognit..

[34]  Naila Murray,et al.  Generalized Max Pooling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[36]  Hanjiang Lai,et al.  Instance-Aware Hashing for Multi-Label Image Retrieval , 2016, IEEE Transactions on Image Processing.

[37]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[38]  Suvrit Sra,et al.  A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of Is(x) , 2012, Comput. Stat..

[39]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Stefanie Jegelka,et al.  Deep Metric Learning via Facility Location , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Feng Zhou,et al.  Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[43]  Arnold W. M. Smeulders,et al.  Local Alignments for Fine-Grained Categorization , 2014, International Journal of Computer Vision.

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[46]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[47]  Jianfei Cai,et al.  Compact Representation for Image Classification: To Choose or to Compress? , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Xiang Yu,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2016 .

[49]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[51]  Shesh N. Rai,et al.  Improving the EM Algorithm , 1993 .

[52]  Song Bai,et al.  Triplet-Center Loss for Multi-view 3D Object Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[54]  Qi Tian,et al.  Part-Based Deep Hashing for Large-Scale Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[55]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[56]  Tieniu Tan,et al.  Deep semantic ranking based hashing for multi-label image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).