Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators

Invertible neural networks based on coupling flows (CF-INNs) have various machine learning applications such as image synthesis and representation learning. However, their desirable characteristics such as analytic invertibility come at the cost of restricting the functional forms. This poses a question on their representation power: are CF-INNs universal approximators for invertible functions? Without a universality, there could be a well-behaved invertible transformation that the CF-INN can never approximate, hence it would render the model class unreliable. We answer this question by showing a convenient criterion: a CF-INN is universal if its layers contain affine coupling and invertible linear functions as special cases. As its corollary, we can affirmatively resolve a previously unsolved problem: whether normalizing flow models based on affine coupling can be universal distributional approximators. In the course of proving the universality, we prove a general theorem to show the equivalence of the universality for certain diffeomorphism classes, a theoretical insight that is of interest by itself.

[1]  R. M. Dudley,et al.  Real Analysis and Probability , 1989 .

[2]  Graham Neubig,et al.  Density Matching for Bilingual Word Embedding , 2019, NAACL.

[3]  Avishek Joey Bose,et al.  Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies , 2019, ArXiv.

[4]  Masashi Sugiyama,et al.  Few-shot Domain Adaptation by Causal Mechanism Transfer , 2020, ICML.

[5]  Andrew Gordon Wilson,et al.  Semi-Supervised Learning with Normalizing Flows , 2019, ICML.

[6]  Han Zhang,et al.  Approximation Capabilities of Neural ODEs and Invertible Residual Networks , 2020, ICML.

[7]  J. Mather,et al.  Commutators of diffeomorphisms: II , 1975 .

[8]  Vincent Andrieu,et al.  Expressing an Observer in Preferred Coordinates by Transforming an Injective Immersion into a Surjective Diffeomorphism , 2015, SIAM J. Control. Optim..

[9]  John N. Mather Commutators of diffeomorphisms , 1974 .

[10]  Yaoliang Yu,et al.  Sum-of-Squares Polynomial Flow , 2019, ICML.

[11]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[12]  Youssef Marzouk,et al.  Greedy inference with structure-exploiting lazy maps , 2020, NeurIPS.

[13]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[14]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[15]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[16]  Ivan Kobyzev,et al.  Normalizing Flows: An Introduction and Review of Current Methods , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Joseph Sill,et al.  Monotonic Networks , 1997, NIPS.

[18]  Taiji Suzuki,et al.  Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality , 2018, ICLR.

[19]  Andriy Mnih,et al.  Resampled Priors for Variational Autoencoders , 2018, AISTATS.

[20]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[21]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[22]  Augustin Banyaga,et al.  Sur la structure du groupe des difféomorphismes qui préservent une forme symplectique , 1978 .

[23]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[24]  T. E. Stewart On groups of diffeomorphisms , 1960 .

[25]  V. Bogachev,et al.  Triangular transformations of measures , 2005 .

[26]  Yee Whye Teh,et al.  Augmented Neural ODEs , 2019, NeurIPS.

[27]  Heiga Zen,et al.  Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.

[28]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[29]  Yee Whye Teh,et al.  Hybrid Models with Deep and Invertible Features , 2019, ICML.

[30]  Ullrich Köthe,et al.  Analyzing Inverse Problems with Invertible Neural Networks , 2018, ICLR.

[31]  Aapo Hyvärinen,et al.  Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.

[32]  Nicola De Cao,et al.  Block Neural Autoregressive Flow , 2019, UAI.

[33]  Alexander M. Rush,et al.  Latent Normalizing Flows for Discrete Sequences , 2019, ICML.

[34]  Alexandre Lacoste,et al.  Neural Autoregressive Flows , 2018, ICML.

[35]  Mohammad Havaei,et al.  Learnable Explicit Density for Continuous Latent Space and Variational Inference , 2017, ArXiv.

[36]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[37]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[38]  Pieter Abbeel,et al.  Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design , 2019, ICML.

[39]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[40]  Stefanie Jegelka,et al.  ResNet with one-neuron hidden layers is a Universal Approximator , 2018, NeurIPS.

[41]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[42]  Witold Hurewicz,et al.  Lectures on Ordinary Differential Equations , 1959 .

[43]  D. Epstein,et al.  The simplicity of certain groups of homeomorphisms , 1970 .

[44]  William P. Thurston,et al.  Foliations and groups of diffeomorphisms , 1974 .

[45]  Sungwon Kim,et al.  FloWaveNet : A Generative Flow for Raw Audio , 2018, ICML.