论文信息 - Cutting out the Middle-Man: Training and Evaluating Energy-Based Models without Sampling - 字舞流文

Cutting out the Middle-Man: Training and Evaluating Energy-Based Models without Sampling

We present a new method for evaluating and training unnormalized density models. Our approach only requires access to the gradient of the unnormalized model's log-density. We estimate the Stein discrepancy between the data density p(x) and the model density q(x) defined by a vector function of the data. We parameterize this function with a neural network and fit its parameters to maximize the discrepancy. This yields a novel goodness-of-fit test which outperforms existing methods on high dimensional data. Furthermore, optimizing $q(x)$ to minimize this discrepancy produces a novel method for training unnormalized models which scales more gracefully than existing methods. The ability to both learn and compare models is a unique feature of the proposed method.

R. Zemel | D. Duvenaud | J. Jacobsen | Will Grathwohl | Kuan-Chieh Wang | Kuan-Chieh Jackson Wang

[1] Rob Fergus,et al. Energy-based models for atomic-resolution protein conformations , 2020, ICLR.

[2] Mohammad Norouzi,et al. Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[3] Andrew M. Dai,et al. Flow Contrastive Estimation of Energy-Based Models , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Tian Han,et al. On the Anatomy of MCMC-based Maximum Likelihood Learning of Energy-Based Models , 2019, AAAI.

[5] Annealed Denoising Score Matching: Learning Energy-Based Models in High-Dimensional Spaces , 2019, ArXiv.

[6] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[7] Alessandro Barp,et al. Minimum Stein Discrepancy Estimators , 2019, NeurIPS.

[8] Emmanuel Müller,et al. Intrinsic Multi-scale Evaluation of Generative Models , 2019, ArXiv.

[9] Yang Song,et al. Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[10] Song-Chun Zhu,et al. On Learning Non-Convergent Short-Run MCMC Toward Energy-Based Model , 2019, ArXiv.

[11] Mohammad Norouzi,et al. Understanding Posterior Collapse in Generative Latent Variable Models , 2019, DGS@ICLR.

[12] Igor Mordatch,et al. Implicit Generation and Generalization with Energy Based Models , 2018 .

[13] Hao Wu,et al. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning , 2018, Science.

[14] Debora S. Marks,et al. Learning Protein Structure with a Differentiable Simulator , 2018, ICLR.

[15] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[16] Guang Cheng,et al. Stein Neural Sampler , 2018, ArXiv.

[17] Nicola De Cao,et al. Explorations in Homeomorphic Variational Auto-Encoding , 2018, ArXiv.

[18] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[19] Michael U. Gutmann,et al. Conditional Noise-Contrastive Estimation of Unnormalised Models , 2018, ICML.

[20] Zhijian Ou,et al. Generative Modeling by Inclusive Neural Random Fields with Applications in Image Generation and Anomaly Detection , 2018 .

[21] Zhijian Ou,et al. Learning Neural Random Fields with Inclusive Auxiliary Generators , 2018, ArXiv.

[22] Bernhard Schölkopf,et al. Deep Energy Estimator Networks , 2018, ArXiv.

[23] Song-Chun Zhu,et al. Learning Descriptor Networks for 3D Shape Synthesis and Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[25] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.

[26] Quoc V. Le,et al. Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[27] Jinwoo Shin,et al. Approximating Spectral Sums of Large-Scale Matrices using Stochastic Chebyshev Approximations , 2017, SIAM J. Sci. Comput..

[28] John O'Leary,et al. Unbiased Markov chain Monte Carlo with couplings , 2017, 1708.03625.

[29] Kenji Fukumizu,et al. A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[30] Lester W. Mackey,et al. Measuring Sample Quality with Kernels , 2017, ICML.

[31] Alexander J. Smola,et al. Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[32] Yoshua Bengio,et al. Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[33] Dustin Tran,et al. Operator Variational Inference , 2016, NIPS.

[34] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[35] Qiang Liu,et al. A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[36] Yang Lu,et al. A Theory of Generative ConvNet , 2016, ICML.

[37] Matthias Bethge,et al. A note on the evaluation of generative models , 2015, ICLR.

[38] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.

[39] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40] Sashank J. Reddi,et al. On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions , 2014, AAAI.

[41] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[42] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[43] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[44] Tapani Raiko,et al. Gaussian-Bernoulli deep Boltzmann machine , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[45] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[46] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[47] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[48] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[49] Marc'Aurelio Ranzato,et al. Energy-Based Models in Document Recognition and Computer Vision , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[50] T. Muir. Chemical biology: Cutting out the middle man , 2006, Nature.

[51] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[52] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[53] Yann LeCun,et al. Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[54] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[55] M. Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[56] C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .