Generative Flows with Invertible Attentions

Flow-based generative models have shown excellent ability to explicitly learn the probability density function of data via a sequence of invertible transformations. Yet, modeling long-range dependencies over normalizing flows remains understudied. To fill the gap, in this paper, we introduce two types of invertible attention mechanisms for generative flow models. To be precise, we propose map-based and scaled dot-product attention for unconditional and conditional generative flow models. The key idea is to exploit split-based attention mechanisms to learn the attention weights and input representations on every two splits of flow feature maps. Our method provides invertible attention modules with tractable Jacobian determinants, enabling seamless integration of it at any positions of the flow-based models. The proposed attention mechanism can model the global data dependencies, leading to more comprehensive flow models. Evaluation on multiple generation tasks demonstrates that the introduced attention flow idea results in efficient flow models and compares favorably against the state-of-the-art unconditional and conditional generative flow methods.

[1]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[2]  L. Gool,et al.  SRFlow: Learning the Super-Resolution Space with Normalizing Flow , 2020, ECCV.

[3]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[5]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[7]  Henggang Cui,et al.  Improving Movement Predictions of Traffic Actors in Bird's-Eye View Models using GANs and Differentiable Trajectory Rasterization , 2020, KDD.

[8]  Andreas Loukas,et al.  Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth , 2021, ICML.

[9]  Luc Van Gool,et al.  Weakly Paired Multi-Domain Image Translation , 2020, BMVC.

[10]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[11]  Zhe Gan,et al.  AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[15]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[16]  Hedvig Kjellström,et al.  Full-Glow: Fully conditional Glow for more realistic image generation , 2020, GCPR.

[17]  Bert Huang,et al.  Structured Output Learning with Conditional Generative Flows , 2019, AAAI.

[18]  Ran He,et al.  Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  David Duvenaud,et al.  Residual Flows for Invertible Generative Modeling , 2019, NeurIPS.

[20]  Francesc Moreno-Noguer,et al.  C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Stefan Roth,et al.  Normalizing Flows With Multi-Scale Autoregressive Priors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Francesc Moreno-Noguer,et al.  GANimation: Anatomically-aware Facial Animation from a Single Image , 2018, ECCV.

[24]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[26]  Maxim Neumann,et al.  AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification , 2020, ECCV.

[27]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[28]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[29]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Max Welling,et al.  Emerging Convolutions for Generative Normalizing Flows , 2019, ICML.

[31]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[32]  Sterling C. Johnson,et al.  DUAL-GLOW: Conditional Flow-Based Generative Model for Modality Transfer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Pieter Abbeel,et al.  Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design , 2019, ICML.

[34]  Quoc V. Le,et al.  Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[36]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[37]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[38]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[40]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.