Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN

Multi-attribute conditional image generation is a challenging problem in computer vision. We propose Multi-attribute Pizza Generator (MPG), a conditional Generative Neural Network (GAN) framework for synthesizing images from a trichotomy of attributes: content, view-geometry, and implicit visual style. We design MPG by extending the state-of-the-art StyleGAN2, using a new conditioning technique that guides the intermediate feature maps to learn multi-scale multi-attribute entangled representations of controlling attributes. Because of the complex nature of the multi-attribute image generation problem, we regularize the image generation by predicting the explicit conditioning attributes (ingredients and view). To synthesize a pizza image with view attributes outside the range of natural training images, we design a CGI pizza dataset PizzaView using 3D pizza models and employ it to train a view attribute regressor to regularize the generation process, bridging the real and CGI training datasets. To verify the efficacy of MPG, we test it on Pizza10, a carefully annotated multi-ingredient pizza image dataset. MPG can successfully generate photorealistic pizza images with desired ingredients and view attributes, beyond the range of those observed in real-world training data.

[1]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Honghao Gao,et al.  A Food Dish Image Generation Framework Based on Progressive Growing GANs , 2019, CollaborateCom.

[4]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[5]  Takeru Miyato,et al.  cGANs with Projection Discriminator , 2018, ICLR.

[6]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[7]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[8]  Xiaogang Wang,et al.  StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jing Zhang,et al.  MirrorGAN: Learning Text-To-Image Generation by Redescription , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Bolei Zhou,et al.  Closed-Form Factorization of Latent Semantics in GANs , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Shiguang Shan,et al.  AttGAN: Facial Attribute Editing by Only Changing What You Want , 2017, IEEE Transactions on Image Processing.

[12]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Antonio Torralba,et al.  How to Make a Pizza: Learning a Compositional Layer-Based GAN Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[15]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[16]  Tero Karras,et al.  Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.

[17]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[18]  Ori Bar El,et al.  GILT: Generating Images from Long Text , 2019, ArXiv.

[19]  Gerasimos Spanakis,et al.  LoGANv2: Conditional Style-Based Logo Generation with Generative Adversarial Networks , 2019, 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA).

[20]  Zhe Gan,et al.  AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Harris Drucker,et al.  Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Bin Zhu,et al.  CookGAN: Causality Based Text-to-Image Synthesis , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wataru Shimoda,et al.  Food image generation using a large amount of food images with conditional GAN: ramenGAN and recipeGAN , 2018, MADiMa@IJCAI.

[29]  Vladimir Pavlovic,et al.  CookGAN: Meal Image Synthesis from Ingredients , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).