论文信息 - No MCMC for me: Amortized sampling for fast and stable training of energy-based models

No MCMC for me: Amortized sampling for fast and stable training of energy-based models

Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty. Despite recent advances, training EBMs on high-dimensional data remains a challenging problem as the state-of-the-art approaches are costly, unstable, and require considerable tuning and domain expertise to apply successfully. In this work, we present a simple method for training EBMs at scale which uses an entropy-regularized generator to amortize the MCMC sampling typically used in EBM training. We improve upon prior MCMC-based entropy regularization methods with a fast variational approximation. We demonstrate the effectiveness of our approach by using it to train tractable likelihood models. Next, we apply our estimator to the recently proposed Joint Energy Model (JEM), where we match the original performance with faster and stable training. This allows us to extend JEM models to semi-supervised classification on tabular data from a variety of continuous domains.

[1] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[2] Mohammad Norouzi,et al. Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[3] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4] Ilya Sutskever,et al. Estimating the Hessian by Back-propagating Curvature , 2012, ICML.

[5] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[6] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[7] Song-Chun Zhu,et al. Stochastic Security: Adversarial Defense Using Long-Run Dynamics of Energy-Based Models , 2020, ICLR.

[8] Francisco J. R. Ruiz,et al. Unbiased Implicit Variational Inference , 2018, AISTATS.

[9] Tong Che,et al. Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling , 2020, NeurIPS.

[10] Tijmen Tieleman,et al. Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[11] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[12] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[13] Aaron C. Courville,et al. AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation , 2020, ICML.

[14] Igor Mordatch,et al. Implicit Generation and Generalization with Energy Based Models , 2018 .

[15] Dustin Tran,et al. Hierarchical Variational Models , 2015, ICML.

[16] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[17] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[18] David M. Blei,et al. Prescribed Generative Adversarial Networks , 2019, ArXiv.

[19] Erik Nijkamp,et al. Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model , 2019, NeurIPS.

[20] Kai Xu,et al. Telescoping Density-Ratio Estimation , 2020, NeurIPS.

[21] Andrew M. Dai,et al. Flow Contrastive Estimation of Energy-Based Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Michael E. Tipping,et al. Probabilistic Principal Component Analysis , 1999 .

[23] Stefano Ermon,et al. Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[24] Barnabás Póczos,et al. Analysis of k-Nearest Neighbor Distances with Application to Entropy Estimation , 2016, ArXiv.

[25] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[26] Zhijian Ou,et al. Learning Neural Random Fields with Inclusive Auxiliary Generators , 2018, ArXiv.

[27] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[28] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[29] Yoshua Bengio,et al. NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[30] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[31] Stefano Ermon,et al. Understanding the Limitations of Variational Mutual Information Estimators , 2020, ICLR.

[32] Yee Whye Teh,et al. Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[33] Debora S. Marks,et al. Learning Protein Structure with a Differentiable Simulator , 2018, ICLR.