Specifying Object Attributes and Relations in Interactive Scene Generation

We introduce a method for the generation of images from an input scene graph. The method separates between a layout embedding and an appearance embedding. The dual embedding leads to generated images that better match the scene graph, have higher visual quality, and support more complex scene graphs. In addition, the embedding scheme supports multiple and diverse output images per scene graph, which can be further controlled by the user. We demonstrate two modes of per-object control: (i) importing elements from other images, and (ii) navigation in the object space by selecting an appearance archetype. Our code is publicly available at https://www.github.com/ashual/scene_generation.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[3]  Michael S. Bernstein,et al.  Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[5]  Li Fei-Fei,et al.  Image Generation from Scene Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[9]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[11]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[12]  José M. F. Moura,et al.  Multiple Source Domain Adaptation with Adversarial Learning , 2018, ICLR.

[13]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[17]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[18]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Bo Zhao,et al.  Image Generation From Layout , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Melanie Mitchell,et al.  Semantic Image Retrieval via Active Grounding of Visual Situations , 2018, 2018 IEEE 12th International Conference on Semantic Computing (ICSC).

[24]  Seunghoon Hong,et al.  Inferring Semantic Layout for Text-to-Image Synthesis , 2018 .

[25]  Bolei Zhou,et al.  GAN Dissection: Visualizing and Understanding Generative Adversarial Networks , 2018, ICLR.

[26]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[29]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.