Semantic Nighttime Image Segmentation with Synthetic Stylized Data, Gradual Adaptation and Uncertainty-Aware Evaluation

This work addresses the problem of semantic segmentation of nighttime images. The main direction of recent progress in semantic segmentation pertains to daytime scenes with favorable illumination conditions. We focus on improving the performance of state-of-the-art methods on the nighttime domain by adapting them to nighttime data without extra annotations, and designing a new evaluation framework to address the uncertainty of semantics in nighttime images. To this end, we make the following contributions: 1) a novel pipeline for dataset-scale guided style transfer to generate synthetic nighttime images from real daytime input; 2) a framework to gradually adapt semantic segmentation models from day to night via stylized and real images of progressively increasing darkness; 3) a novel uncertainty-aware annotation and evaluation framework and metric for semantic segmentation in adverse conditions; 4) the Dark Zurich dataset with 2416 nighttime and 2920 twilight unlabeled images plus 20 nighttime images with pixel-level annotations that conform to our newly-proposed evaluation. Our experiments evidence that both our stylized data per se and our gradual adaptation significantly boost performance at nighttime both for standard evaluation metrics and our metric. Moreover, our new evaluation reveals that state-of-the-art segmentation models output overly confident predictions at indiscernible regions compared to visible ones.

[1]  Alex Bewley,et al.  Incremental Adversarial Domain Adaptation for Continually Changing Environments , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[3]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Oliver Zendel,et al.  How Good Is My Test Data? Introducing Safety Analysis for Computer Vision , 2017, International Journal of Computer Vision.

[5]  Shree K. Nayar,et al.  Vision and the Atmosphere , 2002, International Journal of Computer Vision.

[6]  Xia Liu,et al.  Pedestrian detection and tracking with night vision , 2005, IEEE Transactions on Intelligent Transportation Systems.

[7]  Hong Yan,et al.  Bayes Saliency-Based Object Proposal Generator for Nighttime Traffic Images , 2018, IEEE Transactions on Intelligent Transportation Systems.

[8]  Luc Van Gool,et al.  Model Adaptation with Synthetic and Real Data for Semantic Dense Foggy Scene Understanding , 2018 .

[9]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[11]  Luc Van Gool,et al.  ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Frédo Durand,et al.  Data-driven hallucination of different times of day from a single outdoor photo , 2013, ACM Trans. Graph..

[13]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[15]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ian D. Reid,et al.  RefineNet : MultiPath Refinement Networks with Identity Mappings for High-Resolution Semantic Segmentation , 2016 .

[17]  Swami Sankaranarayanan,et al.  Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Yupin Luo,et al.  Real-Time Pedestrian Detection and Tracking at Nighttime for Driver-Assistance Systems , 2009, IEEE Transactions on Intelligent Transportation Systems.

[19]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Xueting Li,et al.  A Closed-form Solution to Photorealistic Image Stylization , 2018, ECCV.

[21]  Mohan M. Trivedi,et al.  Looking at Vehicles in the Night: Detection and Dynamics of Rear Lights , 2019, IEEE Transactions on Intelligent Transportation Systems.

[22]  Oliver Zendel,et al.  WildDash - Creating Hazard-Aware Benchmarks , 2018, ECCV.

[23]  Luc Van Gool,et al.  Semantic Foggy Scene Understanding with Synthetic Data , 2017, International Journal of Computer Vision.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Sylvain Paris,et al.  Deep Photo Style Transfer , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Germán Ros,et al.  Unsupervised image transformation for outdoor semantic labelling , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[27]  Antonio M. López,et al.  Road Detection Based on Illuminant Invariance , 2011, IEEE Transactions on Intelligent Transportation Systems.

[28]  Wolfram Burgard,et al.  AdapNet: Adaptive semantic segmentation in adverse environmental conditions , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Luc Van Gool,et al.  Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[30]  Lihi Zelnik-Manor,et al.  The Contextual Loss for Image Transformation with Non-Aligned Data , 2018, ECCV.

[31]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[32]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33]  Ming-Hsuan Yang,et al.  Learning to Adapt Structured Output Space for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Kang Ryoung Park,et al.  Convolutional Neural Network-Based Human Detection in Nighttime Images Using Visible Light Camera Sensors , 2017, Sensors.

[35]  Chongzhao Han,et al.  Night-time pedestrian detection by visual-infrared video fusion , 2008, 2008 7th World Congress on Intelligent Control and Automation.

[36]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[38]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).