Scene Generation from Backgrounds to Objects and Anything in Between: A Deep Learning Robotics Survey

The recent rapid progress of deep learning algorithms in generating realistic images, especially in Generative Adversarial Networks (GAN) and Variational Auto-Encoders (VAE), has helped advance new applications. Examples of such applications range from generating and manipulating new synthetic data for self-driving cars, to building/urban architectures, to interior design, and gaming. Furthermore, several applications have benefited from deep learning generative advancement, such as robotics manipulations in structured and unstructured environments, virtual fashion clothes try-on, and item identification on the go. This survey paper provides a review of techniques for image generation from background outdoor scenes, to building facades and objects, and anything in between. In particular, we will cover scene generation such as outdoor landscapes, building facades and indoor scenes. For each category, we will compare the existing state of the art algorithms and techniques, and discuss their performance and gaps limitations on a wide variety of inputs. Additionally, we will discuss challenges and future trends to advance the state of the art in realistic image generation.

[1]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[4]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Christopher Ré,et al.  Learning to Compose Domain-Specific Transformations for Data Augmentation , 2017, NIPS.

[7]  Huijuan Zhang,et al.  Medical Image Synthetic Data Augmentation Using GAN , 2020, CSAE.

[8]  Chiyu Wang,et al.  Visual-attention GAN for interior sketch colourisation , 2021, IET Image Process..

[9]  Jonathan P. How,et al.  Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Matthias Zwicker,et al.  Stylistic scene enhancement GAN: mixed stylistic enhancement generation for 3D indoor scenes , 2019, The Visual Computer.

[11]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[12]  Luc Van Gool,et al.  Pose Guided Person Image Generation , 2017, NIPS.

[13]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[14]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[15]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[16]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[17]  Jaakko Lehtinen,et al.  Few-Shot Unsupervised Image-to-Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Kaiqi Huang,et al.  GP-GAN: Towards Realistic High-Resolution Image Blending , 2017, ACM Multimedia.

[20]  Julian Togelius,et al.  Bootstrapping Conditional GANs for Video Game Level Generation , 2019, 2020 IEEE Conference on Games (CoG).

[21]  Neil Smith,et al.  Latent Filter Scaling for Multimodal Unsupervised Image-To-Image Translation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Thomas Brox,et al.  Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[24]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[25]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[26]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[27]  Peter V. Gehler,et al.  A Generative Model of People in Clothing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Benjamin Sapp,et al.  MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction , 2019, CoRL.

[29]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[30]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  M. A. Saleem Durai,et al.  Intelligent video surveillance: a review through deep learning techniques for crowd analysis , 2019, Journal of Big Data.

[32]  Carlo Ratti,et al.  Deep Learning Architect: Classification for Architectural Design through the Eye of Artificial Intelligence , 2018, Lecture Notes in Geoinformation and Cartography.

[33]  Douglas Garcia Torres Generation of Synthetic Data with Generative Adversarial Networks , 2018 .

[34]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[35]  Y. Byun,et al.  SynSigGAN: Generative Adversarial Networks for Synthetic Biomedical Signal Generation , 2020, Biology.

[36]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[37]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[38]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[39]  Peter Wonka,et al.  SEAN: Image Synthesis With Semantic Region-Adaptive Normalization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Mayank Bansal,et al.  ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[41]  Dumitru Erhan,et al.  SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Vladlen Koltun,et al.  Semi-Parametric Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[44]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[45]  Yanqing Zhang,et al.  GAN-based synthetic brain PET image generation , 2020, Brain Informatics.

[46]  Luuk J. Spreeuwers,et al.  A Layer-Based Sequential Framework for Scene Generation with GANs , 2019, AAAI.

[47]  Alexander Schlaefer,et al.  SpeckleGAN: a generative adversarial network with an adaptive speckle layer to augment limited training data for ultrasound image processing , 2020, International Journal of Computer Assisted Radiology and Surgery.

[48]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[49]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50]  Shana L,et al.  Video Surveillance using Deep Learning - A Review , 2019, 2019 International Conference on Recent Advances in Energy-efficient Computing and Communication (ICRAECC).