Attentive Generative Adversarial Network To Bridge Multi-Domain Gap For Image Synthesis

Despite the significant progress on text-to-image synthesis, automatically generating realistic images remains a challenging task since the location and specific shape of object are not given in the text descriptions. To address these problems, we propose a novel attentive generative adversarial network with contextual loss (AGAN-CL) algorithm. More specifically, the generative network consists of two sub-networks: a contextual network for generating image contours, and a cycle transformation autoencoder for converting contours to realistic images. Our core idea is the injection of image contours into the generative network, which is the most critical part of our network, since it will guide the whole generative network to focus on object regions. In addition, we also apply contextual loss and cycle-consistent loss to bridge multi-domain gap. Comprehensive results on several challenging datasets demonstrate the advantage of the proposed method over the leading approaches, regarding both visual fidelity and alignment with input descriptions.

[1]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[2]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[3]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4]  Zhe Gan,et al.  AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Wei Chen,et al.  DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bernt Schiele,et al.  Learning What and Where to Draw , 2016, NIPS.

[8]  Jing Zhang,et al.  MirrorGAN: Learning Text-To-Image Generation by Redescription , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10]  Seunghoon Hong,et al.  Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[16]  Lei Zhang,et al.  Object-Driven Text-To-Image Synthesis via Adversarial Training , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Liwei Wang,et al.  Learning Two-Branch Neural Networks for Image-Text Matching Tasks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Xiaogang Wang,et al.  StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.