Multitask GANs for Semantic Segmentation and Depth Completion With Cycle Consistency

Semantic segmentation and depth completion are two challenging tasks in scene understanding, and they are widely used in robotics and autonomous driving. Although several works are proposed to jointly train these two tasks using some small modifications, like changing the last layer, the result of one task is not utilized to improve the performance of the other one despite that there are some similarities between these two tasks. In this paper, we propose multi-task generative adversarial networks (Multi-task GANs), which are not only competent in semantic segmentation and depth completion, but also improve the accuracy of depth completion through generated semantic images. In addition, we improve the details of generated semantic images based on CycleGAN by introducing multi-scale spatial pooling blocks and the structural similarity reconstruction loss. Furthermore, considering the inner consistency between semantic and geometric structures, we develop a semantic-guided smoothness loss to improve depth completion results. Extensive experiments on Cityscapes dataset and KITTI depth completion benchmark show that the Multi-task GANs are capable of achieving competitive performance for both semantic segmentation and depth completion tasks.

[1]  Ching-Te Chiu,et al.  Real-Time Object Detection With Reduced Region Proposal Network via Multi-Feature Concatenation , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Trevor Darrell,et al.  FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation , 2016, ArXiv.

[3]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yang Tang,et al.  Monocular depth estimation based on deep learning: An overview , 2020, Science China Technological Sciences.

[5]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[6]  Hanqing Lu,et al.  Collaborative Deconvolutional Neural Networks for Joint Depth Estimation and Semantic Segmentation , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Alexander H. Liu,et al.  Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Nick Barnes,et al.  From Depth What Can You See? Depth Completion via Auxiliary Image Reconstruction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[11]  Ruigang Yang,et al.  Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network , 2018, ECCV.

[12]  Duc Thanh Nguyen,et al.  JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds With Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Michael Felsberg,et al.  Confidence Propagation through CNNs for Guided Sparse Depth Regression , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yang Tang,et al.  An Overview of Perception and Decision-Making in Autonomous Systems in the Era of Learning , 2020 .

[16]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[17]  Hujun Bao,et al.  Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Kun Zhang,et al.  Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[20]  Yang Tang,et al.  When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey , 2020, Patterns.

[21]  Yang Tang,et al.  Masked GAN for Unsupervised Depth and Pose Prediction With Scale Consistency , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[23]  Fawzi Nashashibi,et al.  Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation , 2018, 2018 International Conference on 3D Vision (3DV).

[24]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[25]  Zhiyu Xiang,et al.  Simultaneous Semantic Segmentation and Depth Completion with Constraint of Boundary , 2020, Sensors.

[26]  Ryan M. Eustice,et al.  Fast LIDAR localization using multiresolution Gaussian mixture maps , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Dong Liu,et al.  Fully Convolutional Adaptation Networks for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Ling Shao,et al.  Transfer Learning for Visual Categorization: A Survey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Nicu Sebe,et al.  Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Sertac Karaman,et al.  Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[31]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Ming Yang,et al.  Conditional Generative Adversarial Network for Structured Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Ji Zhang,et al.  LOAM: Lidar Odometry and Mapping in Real-time , 2014, Robotics: Science and Systems.

[34]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Sheng Tang,et al.  Asymmetric GAN for Unpaired Image-to-Image Translation , 2019, IEEE Transactions on Image Processing.

[36]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[37]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[38]  M. Pollefeys,et al.  DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yi Yang,et al.  Taking a Closer Look at Domain Shift: Category-Level Adversaries for Semantics Consistent Domain Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  François Rameau,et al.  Unsupervised Intra-Domain Adaptation for Semantic Segmentation Through Self-Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Fuzhen Zhuang,et al.  Deep Subdomain Adaptation Network for Image Classification , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Sertac Karaman,et al.  Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  José García Rodríguez,et al.  A Review on Deep Learning Techniques Applied to Semantic Segmentation , 2017, ArXiv.

[45]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[47]  Nicu Sebe,et al.  Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[49]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Ming-Hsuan Yang,et al.  CrDoCo: Pixel-Level Domain Transfer With Cross-Domain Consistency , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).