Competitive Simplicity for Multi-Task Learning for Real-Time Foggy Scene Understanding via Domain Adaptation

Automotive scene understanding under adverse weather conditions raises a realistic and challenging problem attributable to poor outdoor scene visibility (e.g. foggy weather). However, because most contemporary scene understanding approaches are applied under ideal-weather conditions, such approaches may not provide genuinely optimal performance when compared to established a priori insights on extreme-weather understanding. In this paper, we propose a complex but competitive multi-task learning approach capable of performing in real-time semantic scene understanding and monocular depth estimation under foggy weather conditions by leveraging both recent advances in adversarial training and domain adaptation. As an end-to-end pipeline, our model provides a novel solution to surpass degraded visibility in foggy weather conditions by transferring scenes from foggy to normal using a GAN-based model. For optimal performance in semantic segmentation, our model generates depth to be used as complementary source information with RGB in the segmentation network. We provide a robust method for foggy scene understanding by training two models (normal and foggy) simultaneously with shared weights (each model is trained on each weather condition independently). Our model incorporates RGB colour, depth, and luminance images via distinct encoders with dense connectivity and features fusing, and leverages skip connections to produce consistent depth and segmentation predictions. Using this architectural formulation with light computational complexity at inference time, we are able to achieve comparable performance to contemporary approaches at a fraction of the overall model complexity.

[1]  Toby P. Breckon,et al.  Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  L. Gool,et al.  Semantic Understanding of Foggy Scenes with Purely Synthetic Data , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[4]  Daniel Cremers,et al.  Modular Vehicle Control for Transferring Semantic Information to Unseen Weather Conditions using GANs , 2018, CoRL.

[5]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[6]  Toby P. Breckon,et al.  Real-Time Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Luc Van Gool,et al.  Guided Curriculum Model Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Taeyoung Kim,et al.  PCA Based Computation of Illumination-Invariant Space for Road Detection , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[12]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[14]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jiaying Liu,et al.  Demystifying Neural Style Transfer , 2017, IJCAI.

[16]  Daniel Cremers,et al.  FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture , 2016, ACCV.

[17]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[18]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[19]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[20]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[21]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[24]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[25]  Luc Van Gool,et al.  Semantic Foggy Scene Understanding with Synthetic Data , 2017, International Journal of Computer Vision.

[26]  Luc Van Gool,et al.  Model Adaptation with Synthetic and Real Data for Semantic Dense Foggy Scene Understanding , 2018, ECCV.

[27]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[28]  Ming-Hsuan Yang,et al.  Learning to Adapt Structured Output Space for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Dengxin Dai,et al.  Curriculum Model Adaptation with Synthetic and Real Data for Semantic Foggy Scene Understanding , 2019, International Journal of Computer Vision.

[32]  Toby P. Breckon,et al.  On the Impact of Illumination-Invariant Image Pre-transformation for Contemporary Automotive Semantic Scene Understanding , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[33]  Shao-Yuan Lo,et al.  Incorporating Luminance, Depth and Color Information by a Fusion-Based Network for Semantic Segmentation , 2018, 2019 IEEE International Conference on Image Processing (ICIP).

[34]  Toby P. Breckon,et al.  Generalized dynamic object removal for dense stereo vision based scene mapping using synthesised optical flow , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[35]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[37]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.