Geometric Image Synthesis

The task of generating natural images from 3D scenes has been a long standing goal in computer graphics. On the other hand, recent developments in deep neural networks allow for trainable models that can produce natural-looking images with little or no knowledge about the scene structure. While the generated images often consist of realistic looking local patterns, the overall structure of the generated images is often inconsistent. In this work we propose a trainable, geometry-aware image generation method that leverages various types of scene information, including geometry and segmentation, to create realistic looking natural images that match the desired scene structure. Our geometrically-consistent image synthesis method is a deep neural network, called Geometry to Image Synthesis (GIS) framework, which retains the advantages of a trainable method, e.g., differentiability and adaptiveness, but, at the same time, makes a step towards the generalizability, control and quality output of modern graphics rendering engines. We utilize the GIS framework to insert vehicles in outdoor driving scenes, as well as to generate novel views of objects from the Linemod dataset. We qualitatively show that our network is able to generalize beyond the training set to novel scene geometries, object shapes and segmentations. Furthermore, we quantitatively show that the GIS framework can be used to synthesize large amounts of training data which proves beneficial for training instance segmentation models.

[1]  Visvanathan Ramesh,et al.  Model-driven Simulations for Deep Convolutional Neural Networks , 2016, ArXiv.

[2]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Cewu Lu,et al.  Generating Instance Segmentation Annotation by Geometry-guided GAN , 2018, ArXiv.

[4]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[5]  Thomas Brox,et al.  Learning to Generate Chairs, Tables and Cars with Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Andreas Geiger,et al.  Augmented Reality meets Deep Learning , 2017, BMVC.

[7]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[9]  Deng Cai,et al.  On the Diversity of Realistic Image Synthesis , 2017, ArXiv.

[10]  Sebastian Nowozin,et al.  Stabilizing Training of Generative Adversarial Networks through Regularization , 2017, NIPS.

[11]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Vladlen Koltun,et al.  Playing for Benchmarks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[14]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[15]  Zhenhua Wang,et al.  Synthesizing Training Images for Boosting Human 3D Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[16]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[18]  Matthew Johnson-Roberson,et al.  Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[22]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Takeo Kanade,et al.  Learning scene-specific pedestrian detectors without real data , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ersin Yumer,et al.  Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Antonio Manuel López Peña,et al.  Procedural Generation of Videos to Train Deep Action Recognition Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Magnus Wrenninge,et al.  Procedural Modeling and Physically Based Rendering for Synthetic Data Generation in Automotive Applications , 2017, ArXiv.

[27]  Varun Jampani,et al.  Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Stefan Leutenegger,et al.  SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Eric Brachmann,et al.  Global Hypothesis Generation for 6D Object Pose Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Andreas Geiger,et al.  Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes , 2017, International Journal of Computer Vision.

[34]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Dinesh Manocha,et al.  STD-PD: Generating Synthetic Training Data for Pedestrian Detection in Unannotated Videos , 2017, ArXiv.

[36]  Pushmeet Kohli,et al.  Multiple Choice Learning: Learning to Produce Multiple Structured Outputs , 2012, NIPS.

[37]  Abhinav Gupta,et al.  Generative Image Modeling Using Style and Structure Adversarial Networks , 2016, ECCV.

[38]  Daniel Cremers,et al.  What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation? , 2018, International Journal of Computer Vision.

[39]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[40]  Visvanathan Ramesh,et al.  Model-Driven Simulations for Computer Vision , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41]  Martial Hebert,et al.  Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).