Invertible normalizing flow neural networks by JKO scheme

Normalizing flow is a class of deep generative models for efficient sampling and density estimation. In practice, the flow often appears as a chain of invertible neural network blocks; to facilitate training, existing works have regularized flow trajectories and designed special network architectures. The current paper develops a neural ODE flow network inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which allows efficient block-wise training of the residual blocks without sampling SDE trajectories or inner loops of score matching or variational learning. As the JKO scheme unfolds the dynamic of gradient flow, the proposed model naturally stacks residual network blocks one by one, reducing the memory load and difficulty in performing end-to-end deep flow network training. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the trajectory in probability space, which improves the model training efficiency and accuracy in practice. Using numerical experiments with synthetic and real data, we show that the proposed JKO-iFlow model achieves similar or better performance in generating new samples compared with the existing flow and diffusion models at a significantly reduced computational and memory cost.

[1]  Samy Wu Fung,et al.  Taming hyperparameter tuning in continuous normalizing flows using the JKO scheme , 2022, Scientific Reports.

[2]  Ricky T. Q. Chen,et al.  Flow Matching for Generative Modeling , 2022, ICLR.

[3]  M. S. Albergo,et al.  Building Normalizing Flows with Stochastic Interpolants , 2022, ICLR.

[4]  Rongjie Lai,et al.  Bridging Mean-Field Games and Normalizing Flows with Trajectory Regularization , 2022, J. Comput. Phys..

[5]  Nicholas M. Boffi,et al.  Probability flow solution of the Fokker–Planck equation , 2022, Machine Learning: Science and Technology.

[6]  Xiuyuan Cheng,et al.  Invertible Neural Networks for Graph Prediction , 2022, IEEE Journal on Selected Areas in Information Theory.

[7]  Satyen Kale,et al.  Self-Consistency of the Fokker-Planck Equation , 2022, COLT.

[8]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yongxin Chen,et al.  Variational Wasserstein gradient flow , 2021, ICML.

[10]  B. Laurent,et al.  MMD Aggregated Two-Sample Test , 2021, J. Mach. Learn. Res..

[11]  Yongxin Chen,et al.  Diffusion Normalizing Flow , 2021, NeurIPS.

[12]  Evgeny Burnaev,et al.  Generative Modeling with Optimal Transport Maps , 2021, ICLR.

[13]  H. Bölcskei,et al.  High-dimensional distribution generation through deep neural networks , 2021, Partial Differential Equations and Applications.

[14]  Andreas Krause,et al.  Proximal Optimal Transport Modeling of Population Dynamics , 2021, AISTATS.

[15]  Aaron C. Courville,et al.  A Variational Perspective on Diffusion-Based Generative Models and Score Matching , 2021, NeurIPS.

[16]  Youssef Mroueh,et al.  Optimizing Functionals on the Space of Probabilities with Input Convex Neural Networks , 2021, Trans. Mach. Learn. Res..

[17]  Evgeny Burnaev,et al.  Large-Scale Wasserstein Gradient Flows , 2021, NeurIPS.

[18]  Tong Zhang,et al.  A Framework of Composite Functional Gradient Methods for Generative Adversarial Models , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  B. Ommer,et al.  Taming Transformers for High-Resolution Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[21]  N. Mitra,et al.  StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows , 2020, ACM Trans. Graph..

[22]  H. Bölcskei,et al.  Constructive Universal High-Dimensional Distribution Generation through Deep ReLU Networks , 2020, ICML.

[23]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[24]  Maximilian Nickel,et al.  Riemannian Continuous Normalizing Flows , 2020, NeurIPS.

[25]  M. Opper,et al.  Interacting Particle Solutions of Fokker–Planck Equations Through Gradient–Log–Density Estimation , 2020, Entropy.

[26]  Samy Wu Fung,et al.  OT-Flow: Fast and Accurate Continuous Normalizing Flows via Optimal Transport , 2020, AAAI.

[27]  Ivan Kobyzev,et al.  Normalizing Flows: An Introduction and Review of Current Methods , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Jianfeng Lu,et al.  A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions , 2020, NeurIPS.

[29]  Adam M. Oberman,et al.  How to train your neural ODE , 2020, ICML.

[30]  Youssef Mroueh,et al.  Generative Modeling with Denoising Auto-Encoders and Langevin Sampling , 2020, ArXiv.

[31]  Samy Wu Fung,et al.  A machine learning framework for solving high-dimensional mean field game and mean field control problems , 2019, Proceedings of the National Academy of Sciences.

[32]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[33]  Vage Egiazarian,et al.  Wasserstein-2 Generative Networks , 2019, ICLR.

[34]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[35]  Diederik P. Kingma,et al.  An Introduction to Variational Autoencoders , 2019, Found. Trends Mach. Learn..

[36]  David Duvenaud,et al.  Residual Flows for Invertible Generative Modeling , 2019, NeurIPS.

[37]  Maxim Raginsky,et al.  Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[38]  Aviral Kumar,et al.  Graph Normalizing Flows , 2019, NeurIPS.

[39]  Mohammad Norouzi,et al.  Understanding Posterior Collapse in Generative Latent Variable Models , 2019, DGS@ICLR.

[40]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[41]  Maxim Raginsky,et al.  Theoretical guarantees for sampling and inference in generative models with latent diffusions , 2019, COLT.

[42]  Jens Behrmann,et al.  Invertible Residual Networks , 2018, ICML.

[43]  Matus Telgarsky,et al.  Size-Noise Tradeoffs in Generative Networks , 2018, NeurIPS.

[44]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[45]  E Weinan,et al.  Monge-Ampère Flow for Generative Modeling , 2018, ArXiv.

[46]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[47]  Antoine Liutkus,et al.  Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions , 2018, ICML.

[48]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[49]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[50]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[51]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[52]  Tengyu Ma,et al.  On the Ability of Neural Nets to Express Distributions , 2017, COLT.

[53]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Alexander J. Smola,et al.  Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[55]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[56]  J. Z. Kolter,et al.  Input Convex Neural Networks , 2016, ICML.

[57]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[58]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[59]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[60]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[61]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[62]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[63]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[64]  Aaron C. Courville,et al.  Generative Adversarial Networks , 2014, 1406.2661.

[65]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[66]  T. Sideris Ordinary Differential Equations and Dynamical Systems , 2013 .

[67]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[68]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[69]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[70]  E. Tabak,et al.  DENSITY ESTIMATION BY DUAL ASCENT OF THE LOG-LIKELIHOOD ∗ , 2010 .

[71]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[73]  E. Giné,et al.  On the Bootstrap of $U$ and $V$ Statistics , 1992 .

[74]  Pierre Degond,et al.  A Deterministic Approximation of Diffusion Equations Using Particles , 1990, SIAM J. Sci. Comput..

[75]  P. Degond,et al.  The weighted particle method for convection-diffusion equations , 1989 .

[76]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[77]  D. Kinderlehrer,et al.  THE VARIATIONAL FORMULATION OF THE FOKKER-PLANCK EQUATION , 1996 .

[78]  M. Hutchinson A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[79]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .