Learning Energy-Based Models in High-Dimensional Spaces with Multi-scale Denoising Score Matching

Energy-Based Models (EBMs) assign unnormalized log-probability to data samples. This functionality has a variety of applications, such as sample synthesis, data denoising, sample restoration, outlier detection, Bayesian reasoning, and many more. But training of EBMs using standard maximum likelihood is extremely slow because it requires sampling from the model distribution. Score matching potentially alleviates this problem. In particular, denoising score matching \citep{vincent2011connection} has been successfully used to train EBMs. Using noisy data samples with one fixed noise level, these models learn fast and yield good results in data denoising \citep{saremi2019neural}. However, demonstrations of such models in high quality sample synthesis of high dimensional data were lacking. Recently, \citet{song2019generative} have shown that a generative model trained by denoising score matching accomplishes excellent sample synthesis, when trained with data samples corrupted with multiple levels of noise. Here we provide analysis and empirical evidence showing that training with multiple noise levels is necessary when the data dimension is high. Leveraging this insight, we propose a novel EBM trained with multi-scale denoising score matching. Our model exhibits data generation performance comparable to state-of-the-art techniques such as GANs, and sets a new baseline for EBMs. The proposed model also provides density information and performs well in an image inpainting task.

[1]  Aapo Hyvärinen,et al.  Neural Empirical Bayes , 2019, J. Mach. Learn. Res..

[2]  Lei Zhang,et al.  Convolutional adaptive denoising autoencoders for hierarchical feature extraction , 2016, Frontiers of Computer Science.

[3]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[4]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  Martin J. Wainwright,et al.  Scale Mixtures of Gaussians and the Statistics of Natural Images , 1999, NIPS.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Tian Han,et al.  On the Anatomy of MCMC-based Maximum Likelihood Learning of Energy-Based Models , 2019, AAAI.

[9]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[11]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[12]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[13]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[14]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[15]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[16]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[17]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[18]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[19]  B. Chandra,et al.  Adaptive Noise Schedule for Denoising Autoencoder , 2014, ICONIP.

[20]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[21]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[22]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[23]  Bernhard Schölkopf,et al.  Deep Energy Estimator Networks , 2018, ArXiv.

[24]  Yoshua Bengio,et al.  Maximum Entropy Generators for Energy-Based Models , 2019, ArXiv.

[25]  Eric Jang,et al.  Generative Ensembles for Robust Anomaly Detection , 2018, ArXiv.

[26]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[27]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[28]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[29]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[30]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[31]  Ruslan Salakhutdinov,et al.  Accurate and conservative estimates of MRF log-likelihood using reverse annealing , 2014, AISTATS.

[32]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[33]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[34]  Y. Le Cun,et al.  Double backpropagation increasing generalization performance , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[35]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[36]  Yann LeCun,et al.  Regularized estimation of image statistics by Score Matching , 2010, NIPS.

[37]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Charles A. Sutton,et al.  Scheduled denoising autoencoders , 2015, ICLR.

[39]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[40]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[41]  Rishi Sharma,et al.  A Note on the Inception Score , 2018, ArXiv.

[42]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[43]  Rémi Munos,et al.  Autoregressive Quantile Networks for Generative Modeling , 2018, ICML.

[44]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[45]  Yu Cheng,et al.  Deep Structured Energy Based Models for Anomaly Detection , 2016, ICML.

[46]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[47]  David Kappel,et al.  Deep Rewiring: Training very sparse deep networks , 2017, ICLR.

[48]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[49]  D. Kinderlehrer,et al.  THE VARIATIONAL FORMULATION OF THE FOKKER-PLANCK EQUATION , 1996 .

[50]  David Duvenaud,et al.  Residual Flows for Invertible Generative Modeling , 2019, NeurIPS.

[51]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[52]  Yang Song,et al.  Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[53]  Eero P. Simoncelli,et al.  Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons , 2011, NIPS.