OT-Flow: Fast and Accurate Continuous Normalizing Flows via Optimal Transport

A normalizing flow is an invertible mapping between an arbitrary probability distribution and a standard normal distribution; it can be used for density estimation and statistical inference. Computing the flow follows the change of variables formula and thus requires invertibility of the mapping and an efficient way to compute the determinant of its Jacobian. To satisfy these requirements, normalizing flows typically consist of carefully chosen components. Continuous normalizing flows (CNFs) are mappings obtained by solving a neural ordinary differential equation (ODE). The neural ODE's dynamics can be chosen almost arbitrarily while ensuring invertibility. Moreover, the log-determinant of the flow's Jacobian can be obtained by integrating the trace of the dynamics' Jacobian along the flow. Our proposed OT-Flow approach tackles two critical computational challenges that limit a more widespread use of CNFs. First, OT-Flow leverages optimal transport (OT) theory to regularize the CNF and enforce straight trajectories that are easier to integrate. Second, OT-Flow features exact trace computation with time complexity equal to trace estimators used in existing CNFs. On five high-dimensional density estimation and generative modeling tasks, OT-Flow performs competitively to a state-of-the-art CNF while on average requiring one-fourth of the number of weights with 19x speedup in training time and 28x speedup in inference.

[1]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[2]  Jason D. Lee,et al.  On the Convergence and Robustness of Training GANs with Regularized Optimal Transport , 2018, NeurIPS.

[3]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[4]  Debora S. Marks,et al.  Learning Protein Structure with a Differentiable Simulator , 2018, ICLR.

[5]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[6]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[7]  Hao Wu,et al.  Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning , 2018, Science.

[8]  Alexandre Lacoste,et al.  Neural Autoregressive Flows , 2018, ICML.

[9]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[10]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[11]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[12]  S. Scott Collis,et al.  Analysis of the Streamline Upwind/Petrov Galerkin Method Applied to the Solution of Optimal Control Problems ∗ , 2002 .

[13]  E Weinan,et al.  Monge-Ampère Flow for Generative Modeling , 2018, ArXiv.

[14]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[15]  Johan A. K. Suykens,et al.  On-Line Learning Fokker-Planck Machine , 1998, Neural Processing Letters.

[16]  Tom Drummond,et al.  Parallel Optimal Transport GAN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yann Brenier,et al.  A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem , 2000, Numerische Mathematik.

[18]  Lars Ruthotto,et al.  Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows , 2020, ArXiv.

[19]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[20]  Andreas Griewank,et al.  Trends in PDE Constrained Optimization , 2014 .

[21]  Lawrence Carin,et al.  Continuous-Time Flows for Efficient Inference and Density Estimation , 2017, ICML.

[22]  L. Evans Partial Differential Equations and Monge-Kantorovich Mass Transfer , 1997 .

[23]  E. Tabak,et al.  A Family of Nonparametric Density Estimation Algorithms , 2013 .

[24]  Ernst Hairer,et al.  Numerical methods for evolutionary differential equations , 2010, Math. Comput..

[25]  Jean-David Benamou,et al.  Variational Mean Field Games , 2017 .

[26]  Keegan Lensink,et al.  Fluid Flow Mass Transport for Generative Networks , 2019, ArXiv.

[27]  C. Villani Optimal Transport: Old and New , 2008 .

[28]  Samy Wu Fung,et al.  APAC-Net: Alternating the Population and Agent Control via Two Neural Networks to Solve High-Dimensional Stochastic Mean Field Games , 2020, ArXiv.

[29]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[30]  M. Hutchinson A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[31]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[32]  Yousef Saad,et al.  Fast Estimation of tr(f(A)) via Stochastic Lanczos Quadrature , 2017, SIAM J. Matrix Anal. Appl..

[33]  Levon Nurbekyan,et al.  A machine learning framework for solving high-dimensional mean field game and mean field control problems , 2020, Proceedings of the National Academy of Sciences.

[34]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[35]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[36]  Iain Murray,et al.  Neural Spline Flows , 2019, NeurIPS.

[37]  Adam M. Oberman,et al.  How to Train Your Neural ODE: the World of Jacobian and Kinetic Regularization , 2020, ICML.

[38]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[39]  K. Cranmer,et al.  MadMiner: Machine Learning-Based Inference for Particle Physics , 2019, Computing and Software for Big Science.

[40]  Adam M. Oberman,et al.  How to train your neural ODE , 2020, ICML.

[41]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[42]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[43]  Shing-Tung Yau,et al.  A Geometric View of Optimal Transportation and Generative Model , 2017, Comput. Aided Geom. Des..

[44]  Akinori Tanaka,et al.  Discriminator optimal transport , 2019, NeurIPS.

[45]  Kurt Keutzer,et al.  ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs , 2019, IJCAI.

[46]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[47]  Ivan Kobyzev,et al.  Normalizing Flows: An Introduction and Review of Current Methods , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Han Zhang,et al.  Improving GANs Using Optimal Transport , 2018, ICLR.

[49]  Long Chen,et al.  Maximum Principle Based Algorithms for Deep Learning , 2017, J. Mach. Learn. Res..

[50]  W. Gangbo,et al.  The geometry of optimal transportation , 1996 .

[51]  Sivan Toledo,et al.  Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix , 2011, JACM.

[52]  Roland Becker,et al.  Optimal control of the convection-diffusion equation using stabilized finite element methods , 2007, Numerische Mathematik.

[53]  Marek Behr,et al.  The effect of stabilization in finite element methods for the optimal boundary control of the Oseen equations , 2004 .

[54]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Antoine Wehenkel,et al.  Unconstrained Monotonic Neural Networks , 2019, BNAIC/BENELEARN.

[56]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[57]  George Em Karniadakis,et al.  Potential Flow Generator With L2 Optimal Transport Regularity for Generative Models , 2019, IEEE Transactions on Neural Networks and Learning Systems.