DCL: Differential Contrastive Learning for Geometry-Aware Depth Synthesis

We describe a method for realistic depth synthesis that learns diverse variations from the real depth scans and ensures geometric consistency for effective synthetic-to-real transfer. Unlike general image synthesis pipelines, where geometries are mostly ignored, we treat geometries carried by the depth based on their own existence. We propose differential contrastive learning that explicitly enforces the underlying geometric properties to be invariant regarding the real variations been learned. The resulting depth synthesis method is task-agnostic and can be used for training any task-specific networks with synthetic labels. We demonstrate the effectiveness of the proposed method by extensive evaluations on downstream real-world geometric reasoning tasks. We show our method achieves better synthetic-to-real transfer performance than the other state-of-the-art. When fine-tuned on a small number of real-world annotations, our method can even surpass the fully supervised baselines.

[1]  Chunhua Shen,et al.  Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Phillip Isola,et al.  Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere , 2020, ICML.

[3]  Dong Tian,et al.  FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Peter A. Beling,et al.  Simulating Kinect Infrared and Depth Images , 2016, IEEE Transactions on Cybernetics.

[6]  Yue Wang,et al.  PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention , 2018, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[8]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Raquel Urtasun,et al.  LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[11]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[12]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Stefano Soatto,et al.  FDA: Fourier Domain Adaptation for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[15]  Abhinav Gupta,et al.  Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases , 2020, NeurIPS.

[16]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Aykut Erdem,et al.  Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts , 2016, ArXiv.

[18]  Kwang In Kim,et al.  Improving Shape Deformation in Unsupervised Image-to-Image Translation , 2018, ECCV.

[19]  Toby P. Breckon,et al.  Real-Time Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Roberto Cipolla,et al.  Understanding RealWorld Indoor Scenes with Synthetic Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yuhui Zheng,et al.  Recent Progress on Generative Adversarial Networks (GANs): A Survey , 2019, IEEE Access.

[22]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[23]  Feng Liu,et al.  Depth Enhancement via Low-Rank Matrix Completion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Daniel Cremers,et al.  FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture , 2016, ACCV.

[25]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[26]  Philip Bachman,et al.  Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data , 2018, ICML.

[27]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[28]  Yao Guo,et al.  Coupled Real-Synthetic Domain Adaptation for Real-World Deep Depth Enhancement , 2020, IEEE Transactions on Image Processing.

[29]  Subhransu Maji,et al.  Multiresolution Tree Networks for 3D Point Cloud Processing , 2018, ECCV.

[30]  Alberto L. Sangiovanni-Vincentelli,et al.  A LiDAR Point Cloud Generator: from a Virtual World to Autonomous Driving , 2018, ICMR.

[31]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[32]  Ulrich Neumann,et al.  Depth-aware CNN for RGB-D Segmentation , 2018, ECCV.

[33]  Xiang Li,et al.  Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation , 2018, ECCV.

[34]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[35]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Eric P. Xing,et al.  Generative Semantic Manipulation with Mask-Contrasting GAN , 2018, ECCV.

[37]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[38]  M. Spivak A comprehensive introduction to differential geometry , 1979 .

[39]  Seungyong Lee,et al.  Reconstruction-Based Pairwise Depth Dataset for Depth Image Enhancement Using CNN , 2018, ECCV.

[40]  Yongtian Wang,et al.  Deep Surface Normal Estimation With Hierarchical RGB-D Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Liang-Gee Chen,et al.  What Synthesis Is Missing: Depth Adaptation Integrated With Weak Supervision for Indoor Scene Parsing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Petros Daras,et al.  Self-Supervised Deep Depth Denoising , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[45]  David J. Kriegman,et al.  Image to Image Translation for Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Kurt Keutzer,et al.  SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[47]  Lior Wolf,et al.  Unsupervised Cross-Domain Image Generation , 2016, ICLR.

[48]  Russ Tedrake,et al.  A Supervised Approach to Predicting Noise in Depth Images , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[49]  Renjie Liao,et al.  GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Lizhen Wang,et al.  DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs , 2018, ECCV.

[51]  Toby P. Breckon,et al.  Generative adversarial framework for depth filling via Wasserstein metric, cosine transform and domain transfer , 2019, Pattern Recognit..

[52]  Fisher Yu,et al.  Scribbler: Controlling Deep Image Synthesis with Sketch and Color , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Lama Seoud,et al.  Increasing the Robustness of CNN-Based Human Body Segmentation in Range Images by Modeling Sensor-Specific Artifacts , 2018, ECCV Workshops.

[54]  Tomas E. Ward,et al.  Generative Adversarial Networks in Computer Vision , 2019, ACM Comput. Surv..

[55]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Dirk Kraft,et al.  Generation of synthetic Kinect depth images based on empirical noise model , 2017 .

[57]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[58]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[59]  Luc Van Gool,et al.  Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency , 2018, ICLR.

[60]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Jaakko Lehtinen,et al.  Few-Shot Unsupervised Image-to-Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[62]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Thomas Funkhouser,et al.  Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[65]  Stefano Soatto,et al.  Phase Consistent Ecological Domain Adaptation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[67]  Trevor Darrell,et al.  ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation , 2020, AAAI.

[68]  Jianfei Cai,et al.  T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks , 2018, ECCV.

[69]  C.-C. Jay Kuo,et al.  PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation , 2019, NeurIPS.

[70]  Ming-Yu Liu,et al.  PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[71]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[72]  Wenbin Li,et al.  InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset , 2018, BMVC.

[73]  Armin Biess,et al.  Learning Pose Estimation for High-Precision Robotic Assembly Using Simulated Depth Images , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[74]  Ruigang Yang,et al.  Channel Attention Based Iterative Residual Learning for Depth Map Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).