Deep metric learning has been demonstrated to be highly effective in learning semantic representation and encoding information that can be used to measure data similarity, by relying on the embedding learned from metric learning. At the same time, variational autoencoder (VAE) has widely been used to approximate inference and proved to have a good performance for directed probabilistic models. However, for traditional VAE, the data label or feature information are intractable. Similarly, traditional representation learning approaches fail to represent many salient aspects of the data. In this project, we propose a novel integrated framework to learn latent embedding in VAE by incorporating deep metric learning. The features are learned by optimizing a triplet loss on the mean vectors of VAE in conjunction with standard evidence lower bound (ELBO) of VAE. This approach, which we call Triplet based Variational Autoencoder (TVAE), allows us to capture more fine-grained information in the latent embedding. Our model is tested on MNIST data set and achieves a high triplet accuracy of 95.60% while the traditional VAE (Kingma & Welling, 2013) achieves triplet accuracy of 75.08%.
[1]
Yuxin Peng,et al.
Cross-modal deep metric learning with multi-task regularization
,
2017,
2017 IEEE International Conference on Multimedia and Expo (ICME).
[2]
A. Choromańska.
Extreme Multi Class Classification
,
2013
.
[3]
Nir Ailon,et al.
Deep Metric Learning Using Triplet Network
,
2014,
SIMBAD.
[4]
Jason Weston,et al.
Label Embedding Trees for Large Multi-Class Tasks
,
2010,
NIPS.
[5]
Xiang Yu,et al.
Deep Metric Learning via Lifted Structured Feature Embedding
,
2016
.
[6]
Daan Wierstra,et al.
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
,
2014,
ICML.
[7]
Max Welling,et al.
Auto-Encoding Variational Bayes
,
2013,
ICLR.
[8]
Yang Song,et al.
Learning Fine-Grained Image Similarity with Deep Ranking
,
2014,
2014 IEEE Conference on Computer Vision and Pattern Recognition.
[9]
James Philbin,et al.
FaceNet: A unified embedding for face recognition and clustering
,
2015,
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10]
Yoshua Bengio,et al.
Gradient-based learning applied to document recognition
,
1998,
Proc. IEEE.