Domain Adaptive Video Segmentation via Temporal Consistency Regularization

Video semantic segmentation is an essential task for the analysis and understanding of videos. Recent efforts largely focus on supervised video segmentation by learning from fully annotated data, but the learnt models often experience clear performance drop while applied to videos of a different domain. This paper presents DA-VSN, a domain adaptive video segmentation network that addresses domain gaps in videos by temporal consistency regularization (TCR) for consecutive frames of target-domain videos. DA-VSN consists of two novel and complementary designs. The first is cross-domain TCR that guides the prediction of target frames to have similar temporal consistency as that of source frames (learnt from annotated source data) via adversarial learning. The second is intra-domain TCR that guides unconfident predictions of target frames to have similar temporal consistency as confident predictions of target frames. Extensive experiments demonstrate the superiority of our proposed domain adaptive video segmentation network which outperforms multiple baselines consistently by large margins.

[1]  Shijian Lu,et al.  Category Contrast for Unsupervised Domain Adaptation in Visual Tasks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shijian Lu,et al.  RDA: Robust Domain Adaptation via Fourier Adversarial Attacking , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Shijian Lu,et al.  Scale variance minimization for unsupervised domain adaptation in image segmentation , 2021, Pattern Recognit..

[4]  Shijian Lu,et al.  MLAN: Multi-Level Adversarial Network for Domain Adaptive Semantic Segmentation , 2021, Pattern Recognition.

[5]  Shijian Lu,et al.  FSDR: Frequency Space Domain Randomization for Domain Generalization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shijian Lu,et al.  Cross-View Regularization for Domain Adaptive Panoptic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yanpeng Cao,et al.  Uncertainty-Aware Unsupervised Domain Adaptation in Object Detection , 2021, IEEE Transactions on Multimedia.

[8]  Shanghang Zhang,et al.  Instance Adaptive Self-Training for Unsupervised Domain Adaptation , 2020, ECCV.

[9]  Xiaobing Zhang,et al.  Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation , 2020, ECCV.

[10]  Alexei A. Efros,et al.  Space-Time Correspondence as a Contrastive Random Walk , 2020, NeurIPS.

[11]  Fengmao Lv,et al.  Cross-Domain Semantic Segmentation via Domain-Invariant Interactive Relation Transfer , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  In So Kweon,et al.  Video Panoptic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Maxwell D. Collins,et al.  Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation , 2020, ArXiv.

[14]  In So Kweon,et al.  Unsupervised Intra-Domain Adaptation for Semantic Segmentation Through Self-Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Stefano Soatto,et al.  FDA: Fourier Domain Adaptation for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Zhe L. Lin,et al.  Temporally Distributed Networks for Fast Video Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Wen-mei W. Hwu,et al.  Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Zhedong Zheng,et al.  Rectifying Pseudo Label Learning via Uncertainty Estimation for Domain Adaptive Semantic Segmentation , 2020, International Journal of Computer Vision.

[19]  Sid Ying-Ze Bao,et al.  Action Segmentation With Joint Self-Supervised Temporal Domain Adaptation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Hyeran Byun,et al.  Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ghassan AlRegib,et al.  Action Segmentation with Mixed Temporal Domain Adaptation , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[22]  Chunhua Shen,et al.  Efficient Semantic Video Segmentation with Per-frame Inference , 2020, ECCV.

[23]  Erika Lu,et al.  MAST: A Memory-Augmented Self-Supervised Tracker , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Juan Carlos Niebles,et al.  Adversarial Cross-Domain Action Recognition with Co-Attention , 2019, AAAI.

[25]  Zhiwu Lu,et al.  Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow , 2019, AAAI.

[26]  D. Damen,et al.  Multi-Modal Domain Adaptation for Fine-Grained Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Deng Cai,et al.  Domain Adaptation for Semantic Segmentation With Maximum Squares Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Changick Kim,et al.  Self-Ensembling With GAN-Based Data Augmentation for Domain Adaptation in Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Xiaofeng Liu,et al.  Confidence Regularized Self-Training , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Fengmao Lv,et al.  Constructing Self-Motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Ruxin Chen,et al.  Temporal Attentive Alignment for Large-Scale Video Domain Adaptation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Shijian Lu,et al.  GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Nuno Vasconcelos,et al.  Bidirectional Learning for Domain Adaptation of Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Junqing Yu,et al.  Significance-Aware Information Bottleneck for Domain Adaptive Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Allan Jabri,et al.  Learning Correspondence From the Cycle-Consistency of Time , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Yi-Hsuan Tsai,et al.  Domain Adaptation for Structured Output via Discriminative Patch Representations , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Shawn D. Newsam,et al.  Improving Semantic Segmentation via Video Propagation and Label Relaxation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Patrick Pérez,et al.  ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yi Yang,et al.  Taking a Closer Look at Domain Shift: Category-Level Adversaries for Semantics Consistent Domain Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  B. V. Vijaya Kumar,et al.  Unsupervised Domain Adaptation for Semantic Segmentation via Class-Balanced Self-training , 2018, ECCV.

[41]  Min Sun,et al.  Efficient Uncertainty Estimation for Semantic Segmentation in Videos , 2018, ECCV.

[42]  Xin Wang,et al.  Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Dong Liu,et al.  Fully Convolutional Adaptation Networks for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Chun-Yi Lee,et al.  Dynamic Video Segmentation Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Dahua Lin,et al.  Low-Latency Video Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Ming-Hsuan Yang,et al.  Learning to Adapt Structured Output Space for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Luc Van Gool,et al.  ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[49]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[50]  Vladlen Koltun,et al.  Playing for Benchmarks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Peter V. Gehler,et al.  Semantic Video CNNs Through Representation Warping , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[52]  Marc Pollefeys,et al.  Slanted Stixels: Representing San Francisco's Steepest Streets , 2017, BMVC.

[53]  Ignas Budvytis,et al.  Large Scale Labelled Video Data Augmentation for Semantic Segmentation in Driving Scenarios , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[54]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Cristian Sminchisescu,et al.  Semantic Video Segmentation by Gated Recurrent Flow Propagation , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Changhu Wang,et al.  Surveillance Video Parsing with Single Frame Supervision , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Yichen Wei,et al.  Deep Feature Flow for Video Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Michael Ying Yang,et al.  Can Ground Truth Label Propagation from Video Help Semantic Segmentation? , 2016, ECCV Workshops.

[60]  Trevor Darrell,et al.  Clockwork Convnets for Video Semantic Segmentation , 2016, ECCV Workshops.

[61]  Stefan Roth,et al.  Joint Optical Flow and Temporally Consistent Semantic Segmentation , 2016, ECCV Workshops.

[62]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Vladlen Koltun,et al.  Feature Space Optimization for Semantic Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Karteek Alahari,et al.  Weakly-Supervised Semantic Segmentation Using Motion Cues , 2016, ECCV.

[67]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Alec Radford,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[69]  Viorica Patraucean,et al.  Spatio-temporal video autoencoder with differentiable memory , 2015, ArXiv.

[70]  Xuming He,et al.  Multiclass semantic video segmentation with object-level active inference , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[72]  M. Hebert,et al.  Efficient temporal consistency for streaming video scene analysis , 2013, 2013 IEEE International Conference on Robotics and Automation.

[73]  Yann LeCun,et al.  Causal graph-based video segmentation , 2013, 2013 IEEE International Conference on Image Processing.

[74]  Bastian Leibe,et al.  Joint 2D-3D temporally consistent semantic segmentation of street scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[75]  Roberto Cipolla,et al.  Label propagation in video sequences , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[76]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[77]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[79]  Wen Gao,et al.  Part-aware Progressive Unsupervised Domain Adaptation for Person Re-Identification , 2021, IEEE Transactions on Multimedia.

[80]  Gaurav Sharma,et al.  Shuffle and Attend: Video Domain Adaptation , 2020, ECCV.

[81]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[82]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .