Universal Approximation for Log-concave Distributions using Well-conditioned Normalizing Flows

Affine-coupling models (Dinh et al., 2014; 2016) are a common type of normalizing flows, for which the Jacobian of the latent-to-observablevariable transformation is triangular, allowing the likelihood to be computed in linear time. Despite the widespread usage of affine couplings, the special structure of the architecture makes understanding their representational power challenging. The question of universal approximation was only recently resolved by three parallel papers (Huang et al., 2020; Zhang et al., 2020; Koehler et al., 2020) – who showed reasonably regular distributions can be approximated arbitrarily well using affine couplings – albeit with networks with a nearly-singular Jacobian. As ill-conditioned Jacobians are an obstacle for likelihood-based training, the fundamental question remains: which distributions can be approximated using well-conditioned affine coupling flows? In this paper, we show that any log-concave distribution can be approximated using well-conditioned affine-coupling flows. In terms of proof techniques, we uncover deep connections between affine coupling architectures, underdamped Langevin dynamics (a stochastic differential equation often used to sample from Gibbs measures) and Hénon maps (a structured dynamical system that appears in the study of symplectic diffeomorphisms). In terms of informing practice, we approximate a padded version of the input distribution with iid Gaussians – a strategy which (Koehler et al., 2020) empirically observed to result in better-conditioned flows, but had hitherto no theoretical grounding. Our proof can thus be seen as providing theoretical evidence for the benefits of Gaussian padding when training normalizing flows. Duke University Carnegie Mellon University. Correspondence to: Andrej Risteski <aristesk@andrew.cmu.edu>. Third workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (ICML 2021). Copyright 2021 by the author(s).

[1]  M. Ledoux,et al.  Analysis and Geometry of Markov Diffusion Operators , 2013 .

[2]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[3]  Michael I. Jordan,et al.  Is There an Analog of Nesterov Acceleration for MCMC? , 2019, ArXiv.

[4]  Matthew M. Peet,et al.  Exponentially Stable Nonlinear Systems Have Polynomial Lyapunov Functions on Bounded Regions , 2007, IEEE Transactions on Automatic Control.

[5]  M. Talagrand Transportation cost for Gaussian and other product measures , 1996 .

[6]  Gilles Hargé A convex/log-concave correlation inequality for Gaussian measure and an application to abstract Wiener spaces , 2004 .

[7]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[8]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[9]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[10]  E. Lieb,et al.  On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation , 1976 .

[11]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[12]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[13]  A. Kolesnikov,et al.  Mass transportation and contractions , 2011, 1103.1479.

[14]  Amirhossein Taghvaei,et al.  Accelerated Flow for Probability Distributions , 2019, ICML.

[15]  Mandy Eberhart,et al.  Ordinary Differential Equations With Applications , 2016 .

[16]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[17]  Yee Whye Teh,et al.  Augmented Neural ODEs , 2019, NeurIPS.

[18]  Andrej Risteski,et al.  Representational aspects of depth and conditioning in normalizing flows , 2020, ICML.

[19]  Uri M. Ascher,et al.  A First Course in Numerical Methods , 2011 .

[20]  C. Villani,et al.  Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality , 2000 .

[21]  S. Bobkov,et al.  Exponential Integrability and Transportation Cost Related to Logarithmic Sobolev Inequalities , 1999 .

[22]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[23]  P J Fox,et al.  THE FOUNDATIONS OF MECHANICS. , 1918, Science.

[24]  J. Wellner,et al.  Log-Concavity and Strong Log-Concavity: a review. , 2014, Statistics surveys.

[25]  Yaoliang Yu,et al.  Sum-of-Squares Polynomial Flow , 2019, ICML.

[26]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[27]  L. Polterovich The Geometry of the Group of Symplectic Diffeomorphism , 2001 .

[28]  M. Hénon,et al.  A two-dimensional mapping with a strange attractor , 1976 .

[29]  D. Turaev Polynomial approximations of symplectic dynamics and richness of chaos in non-hyperbolic area-preserving maps , 2003 .

[30]  Aaron C. Courville,et al.  Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models , 2020, ArXiv.

[31]  Masashi Sugiyama,et al.  Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators , 2020, NeurIPS.