Generalized Denoising Auto-Encoders as Generative Models

Recent work has shown how denoising and contractive autoencoders implicitly capture the structure of the data-generating density, in the case where the corruption noise is Gaussian, the reconstruction error is the squared error, and the data is continuous-valued. This has led to various proposals for sampling from this implicitly learned density function, using Langevin and Metropolis-Hastings MCMC. However, it remained unclear how to connect the training procedure of regularized auto-encoders to the implicit estimation of the underlying data-generating distribution when the data are discrete, or using other forms of corruption process and reconstruction errors. Another issue is the mathematical justification which is only valid in the limit of small corruption noise. We propose here a different attack on the problem, which deals with all these issues: arbitrary (but noisy enough) corruption, arbitrary reconstruction loss (seen as a log-likelihood), handling both discrete and continuous-valued variables, and removing the bias due to non-infinitesimal corruption noise (or non-infinitesimal contractive penalty).

[1]  Geoffrey E. Hinton Products of experts , 1999 .

[2]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[3]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[4]  Pascal Vincent,et al.  Non-Local Manifold Parzen Windows , 2005, NIPS.

[5]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[6]  Yoshua Bengio,et al.  Nonlocal Estimation of Manifold Structure , 2006, Neural Computation.

[7]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[8]  A. Hyvärinen,et al.  Estimation of Non-normalized Statistical Models , 2009 .

[9]  Yann LeCun,et al.  Regularized estimation of image statistics by Score Matching , 2010, NIPS.

[10]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[11]  Nando de Freitas,et al.  On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[12]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[13]  Yoshua Bengio,et al.  A Generative Process for sampling Contractive Auto-Encoders , 2012, ICML 2012.

[14]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[15]  Tapani Raiko,et al.  Enhanced Gradient for Training Restricted Boltzmann Machines , 2013, Neural Computation.

[16]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[17]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[18]  Li Yao,et al.  Bounding the Test Log-Likelihood of Generative Models , 2014, ICLR.

[19]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.