Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions

Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural images. This paper introduces two extensions of flows and diffusion for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function. To optimize this model, we learn a probabilistic inverse for the argmax that lifts the categorical data to a continuous space. Multinomial Diffusion gradually adds categorical noise in a diffusion process, for which the generative denoising process is learned. We demonstrate that our method outperforms existing dequantization approaches on text modelling and modelling on image segmentation maps in log-likelihood.

[1]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[2]  Max Welling,et al.  Learning Likelihoods with Conditional Normalizing Flows , 2019, ArXiv.

[3]  Efstratios Gavves,et al.  Categorical Normalizing Flows via Continuous Transformations , 2020, ArXiv.

[4]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[5]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[6]  Alexander M. Rush,et al.  Latent Normalizing Flows for Discrete Sequences , 2019, ICML.

[7]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[9]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[12]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2021, ICLR.

[13]  Hugo Larochelle,et al.  RNADE: The real-valued neural autoregressive density-estimator , 2013, NIPS.

[14]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[15]  Pieter Abbeel,et al.  Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design , 2019, ICML.

[16]  Jakub M. Tomczak,et al.  Invertible DenseNets , 2021 .

[17]  Ole Winther,et al.  SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows , 2020, NeurIPS.

[18]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[19]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[20]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[21]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[23]  Prafulla Dhariwal,et al.  Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[24]  Tim Salimans,et al.  IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression , 2021, ICLR.

[25]  Yang Song,et al.  MintNet: Building Invertible Neural Networks with Masked Convolutions , 2019, NeurIPS.

[26]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[27]  Noah Constant,et al.  Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.

[28]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[29]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[30]  Kumar Krishna Agrawal,et al.  Discrete Flows: Invertible Generative Models of Discrete Data , 2019, DGS@ICLR.

[31]  Max Welling,et al.  Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement , 2019, ICML.

[32]  Aaron C. Courville,et al.  Recurrent Batch Normalization , 2016, ICLR.

[33]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[34]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[35]  Emiel Hoogeboom,et al.  Integer Discrete Flows and Lossless Compression , 2019, NeurIPS.

[36]  Ole Winther,et al.  Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow , 2020, NeurIPS.