论文信息 - Background and foreground disentangled generative adversarial network for scene image synthesis

Background and foreground disentangled generative adversarial network for scene image synthesis

Abstract Despite recent generative models have made remarkable progress on adversarial image synthesis, it is still a pivotal and frontier problem to generate high-fidelity images containing diverse entities and complex scene layouts from structured descriptions. To this end, we present a Background and Foreground Disentangled Generative Adversarial Network (BFD-GAN) to synthesize high-quality images from scene graphs. First, our method uses the graph convolutional network to infer a semantic background from the input scene graph. Then, the foreground parsing module that encourages unsupervised generation, is proposed to calculate semantically related foregrounds with fine-grained geometric properties. Furthermore, we also employ the foreground-background integrating module for the final image generation, during which the foreground-relation aware attention is introduced to refine and fuse the inferred foregrounds into the background. Evaluated on the COCO-Stuff and Visual Genome datasets, we benchmark our model against existing methods and show that our BFD-GAN is more capable of generating complex backgrounds and corresponding sharp foregrounds with given scene structures.

[1] Jonathon Shlens,et al. Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[2] Nenghai Yu,et al. Semantics Disentangling for Text-To-Image Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Gang Hu,et al. Sharp and Real Image Super-Resolution Using Generative Adversarial Network , 2017, ICONIP.

[4] Chi-Keung Tang,et al. Image Generation from Sketch Constraint Using Contextual GAN , 2017, ECCV.

[5] Ruigang Yang,et al. ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Weifu Chen,et al. A novel attribute-based generation architecture for facial image editing , 2020, Multimedia Tools and Applications.

[7] Hugo Larochelle,et al. Topic Modeling of Multimodal Data: An Autoregressive Approach , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[9] Bo Zhao,et al. Image Generation From Layout , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Gregory D. Hager,et al. Semantic Image Manipulation Using Scene Graphs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Lizhi Lin,et al. Improving Variational Auto-Encoder with Self-Attention and Mutual Information for Image Generation , 2019, ICVIP.

[13] Danfei Xu,et al. Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Wei Sun,et al. Image Synthesis From Reconfigurable Layout and Style , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15] Xiao-Ping Zhang,et al. DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion , 2020, IEEE Transactions on Image Processing.

[16] Jan Kautz,et al. MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[18] Rob Fergus,et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[19] Jing Zhang,et al. MirrorGAN: Learning Text-To-Image Generation by Redescription , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[21] Yoshua Bengio,et al. Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22] H. T. Kung,et al. Adversarial Learning of Semantic Relevance in Text to Image Synthesis , 2018, AAAI.

[23] Andreas E. Savakis,et al. Semantically Invariant Text-to-Image Generation , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[24] Nishchal K. Verma,et al. Generation of future image frames using autoregressive model , 2015, 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA).

[25] Jian Gu,et al. Seq-SG2SL: Inferring Semantic Layout From Scene Graph Through Sequence to Sequence Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[27] Wei Liu,et al. Multi-Modal Curriculum Learning for Semi-Supervised Image Classification , 2016, IEEE Transactions on Image Processing.

[28] Taesung Park,et al. Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[30] Xiaogang Wang,et al. PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph , 2019, NeurIPS.

[31] Minglun Gong,et al. Hierarchically-Fused Generative Adversarial Network for Text to Realistic Image Synthesis , 2019, 2019 16th Conference on Computer and Robot Vision (CRV).

[32] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[33] Tali Dekel,et al. SinGAN: Learning a Generative Model From a Single Natural Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34] Lior Wolf,et al. Specifying Object Attributes and Relations in Interactive Scene Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35] Nick Barnes,et al. Adversarial Training of Variational Auto-Encoders for High Fidelity Image Generation , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[36] Yu Cheng,et al. StoryGAN: A Sequential Conditional GAN for Story Visualization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Meng Wang,et al. Quality-Aware Unpaired Image-to-Image Translation , 2019, IEEE Transactions on Multimedia.

[38] Shuqiang Jiang,et al. Know More Say Less: Image Captioning Based on Scene Graphs , 2019, IEEE Transactions on Multimedia.

[39] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[40] Akihiro Sugimoto,et al. Visual-Relation Conscious Image Generation from Structured-Text , 2019, ECCV.

[41] Jan Kautz,et al. Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[42] Vittorio Ferrari,et al. COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Wei Wang,et al. LCSCNet: Linear Compressing-Based Skip-Connecting Network for Image Super-Resolution , 2019, IEEE Transactions on Image Processing.

[44] Li Fei-Fei,et al. Image Generation from Scene Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.