Continuous-Time Flows for Deep Generative Models

Normalizing flows have been developed recently as a method for drawing samples from an arbitrary distribution. This method is attractive due to its intrinsic ability to approximate a target distribution arbitrarily well. In practice, however, normalizing flows only consist of a finite number of deterministic transformations, and thus there is no guarantees on the approximation accuracy. In this paper we study the problem of learning deep generative models with {\em continuous-time} flows (CTFs), a family of diffusion-based methods that are able to asymptotically approach a target distribution. We discretize the CTF to make training feasible, and develop theory on the approximation error. A framework is then adopted to distill knowledge from a CTF to an efficient inference network. We apply the technique to deep generative models, including a CTF-based variational autoencoder and an adversarial-network-like density estimator. Experiments on various tasks demonstrate the superiority of the proposed CTF framework compared to existing techniques.

[1]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .

[2]  H. Risken Fokker-Planck Equation , 1984 .

[3]  D. Kinderlehrer,et al.  THE VARIATIONAL FORMULATION OF THE FOKKER-PLANCK EQUATION , 1996 .

[4]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[5]  S. Glotzer,et al.  Time-course gait analysis of hemiparkinsonian rats following 6-hydroxydopamine lesion , 2004, Behavioural Brain Research.

[6]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[7]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  Ryan Babbush,et al.  Bayesian Sampling Using Stochastic Gradient Thermostats , 2014, NIPS.

[10]  Noah D. Goodman,et al.  Amortized Inference in Probabilistic Reasoning , 2014, CogSci.

[11]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[12]  Vivek Rathod,et al.  Bayesian dark knowledge , 2015, NIPS.

[13]  Lawrence Carin,et al.  On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[14]  Lawrence Carin,et al.  Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks , 2015, AAAI.

[15]  Dilin Wang,et al.  Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning , 2016, ArXiv.

[16]  Shakir Mohamed,et al.  Learning in Implicit Generative Models , 2016, ArXiv.

[17]  Zhe Gan,et al.  Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[19]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[20]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[21]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[22]  Dustin Tran,et al.  Operator Variational Inference , 2016, NIPS.

[23]  Zhe Gan,et al.  Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization , 2015, AISTATS.

[24]  Max Welling,et al.  Improving Variational Auto-Encoders using Householder Flow , 2016, ArXiv.

[25]  Yee Whye Teh,et al.  Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[26]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[27]  Dilin Wang,et al.  Learning to Draw Samples with Amortized Stein Variational Gradient Descent , 2017, UAI.

[28]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[29]  Zhe Gan,et al.  VAE Learning via Stein Variational Gradient Descent , 2017, NIPS.

[30]  Ferenc Huszár,et al.  Variational Inference using Implicit Distributions , 2017, ArXiv.

[31]  Yann LeCun,et al.  Energy-based Generative Adversarial Networks , 2016, ICLR.

[32]  Zhe Gan,et al.  Stochastic Gradient Monomial Gamma Sampler , 2017, ICML.

[33]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[34]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[35]  Qiang Liu,et al.  Approximate Inference with Amortised MCMC , 2017, ArXiv.

[36]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[37]  Lawrence Carin,et al.  ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching , 2017, NIPS.

[38]  Stefano Ermon,et al.  Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models , 2017, AAAI.