DeceptionNet: Network-Driven Domain Randomization

We present a novel approach to tackle domain adaptation between synthetic and real data. Instead, of employing "blind" domain randomization, i.e., augmenting synthetic renderings with random backgrounds or changing illumination and colorization, we leverage the task network as its own adversarial guide toward useful augmentations that maximize the uncertainty of the output. To this end, we design a min-max optimization scheme where a given task competes against a special deception network to minimize the task error subject to the specific constraints enforced by the deceiver. The deception network samples from a family of differentiable pixel-level perturbations and exploits the task architecture to find the most destructive augmentations. Unlike GAN-based approaches that require unlabeled data from the target domain, our method achieves robust mappings that scale well to multiple target distributions from source data alone. We apply our framework to the tasks of digit recognition on enhanced MNIST variants, classification and object pose estimation on the Cropped LineMOD dataset as well as semantic segmentation on the Cityscapes dataset and compare it to a number of domain adaptation approaches, thereby demonstrating similar results with superior generalization capabilities.

[1]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Jie Li,et al.  SPIGAN: Privileged Adversarial Learning from Simulation , 2018, ICLR.

[5]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[6]  Manolis I. A. Lourakis,et al.  T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[8]  Slobodan Ilic,et al.  Seeing Beyond Appearance - Mapping Real Images into Geometrical Domains for Unsupervised CAD-based Recognition , 2018, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[10]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[12]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[13]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[15]  Luc Van Gool,et al.  ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[18]  Graham W. Taylor,et al.  Dataset Augmentation in Feature Space , 2017, ICLR.

[19]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[20]  Luc Van Gool,et al.  Learning Semantic Segmentation From Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Slobodan Ilic,et al.  DPOD: Dense 6D Pose Object Detector in RGB images , 2019, ArXiv.

[22]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[23]  Leon Sixt,et al.  RenderGAN: Generating Realistic Labeled Data , 2016, Front. Robot. AI.

[24]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Lior Wolf,et al.  Unsupervised Cross-Domain Image Generation , 2016, ICLR.

[26]  Steven Worley,et al.  A cellular texture basis function , 1996, SIGGRAPH.

[27]  Peter A. Beling,et al.  Simulating Kinect Infrared and Depth Images , 2016, IEEE Transactions on Cybernetics.

[28]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Bui Tuong Phong Illumination for computer generated pictures , 1975, Commun. ACM.

[31]  Du Q. Huynh,et al.  Metrics for 3D Rotations: Comparison and Analysis , 2009, Journal of Mathematical Imaging and Vision.

[32]  Kenneth Y. Goldberg,et al.  Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping Sequences , 2017, CoRL.

[33]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[34]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[35]  Christopher Ré,et al.  Learning to Compose Domain-Specific Transformations for Data Augmentation , 2017, NIPS.

[36]  Philip David,et al.  Domain Adaptation for Semantic Segmentation of Urban Scenes , 2017 .

[37]  Vincent Lepetit,et al.  On Pre-Trained Image Features and Synthetic Images for Deep Learning , 2017, ECCV Workshops.

[38]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[40]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[41]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[42]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[43]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[45]  Slobodan Ilic,et al.  Keep it Unreal: Bridging the Realism Gap for 2.5D Recognition with Geometry Priors Only , 2018, 2018 International Conference on 3D Vision (3DV).

[46]  Trevor Darrell,et al.  FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation , 2016, ArXiv.

[47]  Swami Sankaranarayanan,et al.  Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Vincent Lepetit,et al.  Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[51]  Slobodan Ilic,et al.  DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Vincent Lepetit,et al.  Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Martín Abadi,et al.  Adversarial Patch , 2017, ArXiv.

[54]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[56]  Ming-Hsuan Yang,et al.  Learning to Adapt Structured Output Space for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Min Sun,et al.  No More Discrimination: Cross City Adaptation of Road Scene Segmenters , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[58]  Donald A. Adjeroh,et al.  Unified Deep Supervised Domain Adaptation and Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Nassir Navab,et al.  Deep Model-Based 6D Pose Refinement in RGB , 2018, ECCV.

[60]  Jaakko Lehtinen,et al.  Differentiable Monte Carlo ray tracing through edge sampling , 2018, ACM Trans. Graph..

[61]  Ken Perlin,et al.  Improving noise , 2002, SIGGRAPH.

[62]  Ziyan Wu,et al.  DepthSynth: Real-Time Realistic Synthetic Data Generation from CAD Models for 2.5D Recognition , 2017, 2017 International Conference on 3D Vision (3DV).

[63]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).