Student-t Variational Autoencoder for Robust Density Estimation

We propose a robust multivariate density estimator based on the variational autoencoder (VAE). The VAE is a powerful deep generative model, and used for multivariate density estimation. With the original VAE, the distribution of observed continuous variables is assumed to be a Gaussian, where its mean and variance are modeled by deep neural networks taking latent variables as their inputs. This distribution is called the decoder. However, the training of VAE often becomes unstable. One reason is that the decoder of VAE is sensitive to the error between the data point and its estimated mean when its estimated variance is almost zero. We solve this instability problem by making the decoder robust to the error using a Bayesian approach to the variance estimation: we set a prior for the variance of the Gaussian decoder, and marginalize it out analytically, which leads to proposing the Student-t VAE. Numerical experiments with various datasets show that training of the Student-t VAE is robust, and the Student-tVAE achieves high density estimation performance.

[1]  Ruben Martinez-Cantin,et al.  Robust Bayesian Optimization with Student-t Likelihood , 2017, ArXiv.

[2]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[3]  Seungjin Choi,et al.  Echo-state conditional variational autoencoder for anomaly detection , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[4]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[5]  David Duvenaud,et al.  Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference , 2017, NIPS.

[6]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[9]  Volume 26 , 2002 .

[10]  R. Fildes Journal of the American Statistical Association : William S. Cleveland, Marylyn E. McGill and Robert McGill, The shape parameter for a two variable graph 83 (1988) 289-300 , 1989 .

[11]  Xi Chen,et al.  Variance Reduction for Stochastic Gradient Optimization , 2013, NIPS.

[12]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[13]  Aki Vehtari,et al.  Robust Gaussian Process Regression with a Student-t Likelihood , 2011, J. Mach. Learn. Res..

[14]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[15]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[16]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[17]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[18]  Dinh Phung,et al.  Journal of Machine Learning Research: Preface , 2014 .

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.