A Survey on Deep Learning Technique for Video Segmentation

Video segmentation, i.e., partitioning video frames into multiple segments or objects, plays a critical role in a broad range of practical applications, e.g., visual effect assistance in movie, scene understanding in autonomous driving, and virtual background creation in video conferencing, to name a few. Recently, due to the renaissance of connectionism in computer vision, there has been an influx of numerous deep learning based approaches that have been dedicated to video segmentation and delivered compelling performance. In this survey, we comprehensively review two basic lines of research in this area, i.e., generic object segmentation (of unknown categories) in videos and video semantic segmentation, by introducing their respective task settings, background concepts, perceived need, development history, and main challenges. We also provide a detailed overview of representative literature on both methods and datasets. Additionally, we present quantitative performance comparisons of the reviewed methods on benchmark datasets. At last, we point out a set of unsolved open issues in this field, and suggest possible opportunities for further research.

[1]  Ross B. Girshick,et al.  Mask R-CNN , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  In So Kweon,et al.  Video Panoptic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Thomas Brox,et al.  Motion Trajectory Segmentation via Minimum Cost Multicuts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Longin Jan Latecki,et al.  Maximum weight cliques with mutex constraints for video object segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Guosheng Lin,et al.  Video Object Segmentation and Tracking: A Survey , 2019, ArXiv.

[6]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Cheng Deng,et al.  Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Tatsuya Harada,et al.  Spatio-Temporal Person Retrieval via Natural Language Queries , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Ignas Budvytis,et al.  Semi-Supervised Video Segmentation Using Tree Structured Graphical Models , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Ding Liu,et al.  CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation , 2020, AAAI.

[11]  L. Gool,et al.  Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Zhe L. Lin,et al.  Fast Video Object Segmentation via Dynamic Targeting Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Chenliang Xu,et al.  Can humans fly? Action understanding with multiple classes of actors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Si Liu,et al.  Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Xiaojuan Qi,et al.  AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Min Bai,et al.  UPSNet: A Unified Panoptic Segmentation Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Patrick Pérez,et al.  Geodesic image and video editing , 2010, TOGS.

[18]  Takeo Kanade,et al.  A multi-body factorization method for motion analysis , 1995, Proceedings of IEEE International Conference on Computer Vision.

[19]  Miriam Bellver,et al.  RVOS: End-To-End Recurrent Network for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Bernt Schiele,et al.  Learning Video Object Segmentation from Static Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Luc Van Gool,et al.  The 2019 DAVIS Challenge on VOS: Unsupervised Multi-Object Segmentation , 2019, ArXiv.

[22]  Maxwell D. Collins,et al.  Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation , 2020, ArXiv.

[23]  Shenghua Gao,et al.  Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ming-Hsuan Yang,et al.  Fast and Accurate Online Video Object Segmentation via Tracking Parts , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Gang Wang,et al.  Motion-Guided Cascaded Refinement Network for Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Ganesh Venkatesh,et al.  Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Benjamin M. Marlin,et al.  The Shape-Time Random Field for Semantic Video Labeling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Laura Leal-Taixé,et al.  STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos , 2020, ECCV.

[29]  Erika Lu,et al.  MAST: A Memory-Augmented Self-Supervised Tracker , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Luca Bertinetto,et al.  Anchor Diffusion for Unsupervised Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Michael J. Black,et al.  Video Segmentation via Object Flow , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Benjamin Höferlin,et al.  Evaluation of background subtraction techniques for video surveillance , 2011, CVPR 2011.

[33]  Kristen Grauman,et al.  FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Song Bai,et al.  Occluded Video Instance Segmentation , 2021, ArXiv.

[35]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Stefan Roth,et al.  Joint Optical Flow and Temporally Consistent Semantic Segmentation , 2016, ECCV Workshops.

[37]  Cees Snoek,et al.  Actor and Action Video Segmentation from a Sentence , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  J. Serrat,et al.  Learning Multi-Object Tracking and Segmentation From Automatic Annotations , 2019, Computer Vision and Pattern Recognition.

[39]  James M. Rehg,et al.  Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[40]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[41]  Yizhou Yu,et al.  Visual saliency based on multiscale deep features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Wenguan Wang,et al.  Shifting More Attention to Video Salient Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yuchen Fan,et al.  Video Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  David J. Crandall,et al.  Segmenting Objects From Relational Visual Data , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Stefano Soatto,et al.  Unsupervised Moving Object Detection via Contextual Information Separation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Gedas Bertasius,et al.  Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Yu Li,et al.  Fast Video Object Segmentation using the Global Context Module , 2020, ECCV.

[48]  Song Bai,et al.  SwiftNet: Real-time Video Object Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Yunchao Wei,et al.  Collaborative Video Object Segmentation by Foreground-Background Integration , 2020, ECCV.

[50]  Lijuan Wang,et al.  Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection , 2020, AAAI.

[51]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Chunhua Shen,et al.  End-to-End Video Instance Segmentation with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[54]  Luc Van Gool,et al.  Video Object Segmentation with Episodic Graph Memory Networks , 2020, ECCV.

[55]  Mubarak Shah,et al.  CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  Yu Liu,et al.  Online Meta Adaptation for Fast Video Object Segmentation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Iasonas Kokkinos,et al.  Deep Spatio-Temporal Random Fields for Efficient Video Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Yeong Jun Koh,et al.  Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Karteek Alahari,et al.  Learning Motion Patterns in Videos , 2016, CVPR.

[60]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Thomas Brox,et al.  Video Segmentation with Just a Few Strokes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[62]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Ning Xu,et al.  Fast User-Guided Video Object Segmentation by Interaction-And-Propagation Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Alan Fern,et al.  Budget-Aware Deep Semantic Video Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Sergio Guadarrama,et al.  Tracking Emerges by Colorizing Videos , 2018, ECCV.

[66]  Harpreet S. Sawhney,et al.  Compact Representations of Videos Through Dominant and Multiple Motion Estimation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[67]  Vladlen Koltun,et al.  Feature Space Optimization for Semantic Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Fei-Fei Li,et al.  Discriminative Segment Annotation in Weakly Labeled Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Ian D. Reid,et al.  Robust Real-Time Visual Tracking Using Pixel-Wise Posteriors , 2008, ECCV.

[70]  Qin Huang,et al.  Instance Embedding Transfer to Unsupervised Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71]  Gregory J. Zelinsky,et al.  Efficient Video Segmentation Using Parametric Graph Partitioning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[72]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[73]  Sudipta Roy,et al.  A Survey on Video Segmentation , 2013, ICACNI.

[74]  Michael Felsberg,et al.  A Generative Appearance Model for End-To-End Video Object Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  José García Rodríguez,et al.  A survey on deep learning techniques for image and video semantic segmentation , 2018, Appl. Soft Comput..

[76]  Hao Wang,et al.  Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries , 2020, AAAI.

[77]  Allan Jabri,et al.  Learning Correspondence From the Cycle-Consistency of Time , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Yizhou Wang,et al.  Video Object Segmentation by Learning Location-Sensitive Embeddings , 2018, ECCV.

[79]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[80]  Aggelos K. Katsaggelos,et al.  Efficient Video Object Segmentation via Network Modulation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[81]  Xiangxu Meng,et al.  Discontinuity-aware video object cutout , 2012, ACM Trans. Graph..

[82]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[83]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[84]  P. Anandan,et al.  A unified approach to moving object detection in 2D and 3D scenes , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[85]  Shawn D. Newsam,et al.  Improving Semantic Segmentation via Video Propagation and Label Relaxation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Fatih Murat Porikli,et al.  Selective Video Object Cutout , 2017, IEEE Transactions on Image Processing.

[87]  Peter V. Gehler,et al.  Video Propagation Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Yao Lu,et al.  Coherent Parametric Contours for Interactive Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Xiao Liu,et al.  Weakly Supervised Multiclass Video Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Du Tran,et al.  Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[91]  Xueting Li,et al.  Joint-task Self-supervised Learning for Temporal Correspondence , 2019, NeurIPS.

[92]  Atsushi Nakazawa,et al.  Motion Coherent Tracking Using Multi-label MRF Optimization , 2012, International Journal of Computer Vision.

[93]  Dani Lischinski,et al.  JumpCut , 2015, ACM Trans. Graph..

[94]  Stephen Lin,et al.  A Transductive Approach for Video Object Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[95]  Zhenheng Yang,et al.  Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[96]  James M. Rehg,et al.  Weakly Supervised Learning of Object Segmentations from Web-Scale Video , 2012, ECCV Workshops.

[97]  Yichen Wei,et al.  Deep Feature Flow for Video Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98]  Daniel Cremers,et al.  Motion Competition: A Variational Approach to Piecewise Parametric Motion Segmentation , 2005, International Journal of Computer Vision.

[99]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[100]  Jason J. Corso,et al.  BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[101]  Chun-Yi Lee,et al.  Dynamic Video Segmentation Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[102]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[103]  Jitendra Malik,et al.  Tracking as Repeated Figure/Ground Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[104]  Chi-Keung Tang,et al.  Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[105]  Xiankai Lu,et al.  Video Object Segmentation Using Global and Instance Embedding Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[106]  H. Yao,et al.  Efficient Regional Memory Network for Video Object Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[107]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[108]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[109]  Dongdong Yu,et al.  F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation , 2020, AAAI.

[110]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[111]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[112]  Ling Shao,et al.  Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[113]  Roberto Cipolla,et al.  Label propagation in video sequences , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[114]  Chenliang Xu,et al.  Evaluation of super-voxel methods for early video processing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[115]  Li Xu,et al.  Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[116]  Min Sun,et al.  Efficient Uncertainty Estimation for Semantic Segmentation in Videos , 2018, ECCV.

[117]  Si Liu,et al.  Cross-Modal Progressive Comprehension for Referring Segmentation , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[118]  Chunhua Shen,et al.  Efficient Semantic Video Segmentation with Per-frame Inference , 2020, ECCV.

[119]  A. Criminisi,et al.  Bilayer Segmentation of Live Video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[120]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[121]  Jitendra Malik,et al.  From Lifestyle Vlogs to Everyday Interactions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[122]  Kristen Grauman,et al.  Supervoxel-Consistent Foreground Propagation in Video , 2014, ECCV.

[123]  Arnold W. M. Smeulders,et al.  Tracking by Natural Language Specification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[124]  Ruigang Yang,et al.  Semi-Supervised Video Object Segmentation with Super-Trajectories , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[125]  Ho Kei Cheng,et al.  Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[126]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, SIGGRAPH 2009.

[127]  Felix Järemo Lawin,et al.  Learning What to Learn for Video Object Segmentation , 2020, ECCV.

[128]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[129]  Laura Leal-Taixe,et al.  Make One-Shot Video Object Segmentation Efficient Again , 2020, NeurIPS.

[130]  Ling Shao,et al.  RANet: Ranking Attention Network for Fast Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[131]  Bastian Leibe,et al.  FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[132]  Edward H. Adelson,et al.  Layered representation for motion analysis , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[133]  John See,et al.  Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation , 2020, NeurIPS.

[134]  Jan-Olof Eklundh,et al.  Statistical background subtraction for a mobile observer , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[135]  Zhiwu Lu,et al.  Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow , 2019, AAAI.

[136]  Jiaxu Miao,et al.  VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[137]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[138]  Bastian Leibe,et al.  UnOVOST : Unsupervised Offline Video Object Segmentation and Tracking for the 2019 Unsupervised DAVIS Challenge , 2019 .

[139]  Junsong Yuan,et al.  Track to Detect and Segment: An Online Multi-Object Tracker , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[140]  Ning Xu,et al.  Video Object Segmentation Using Space-Time Memory Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[141]  Yujin Zhang Chapter I An Overview of Image and Video Segmentation in the Last 40 Years , 2006 .

[142]  Xin Wang,et al.  Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[143]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[144]  Alan Yuille,et al.  ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[145]  Alexander G. Schwing,et al.  VideoMatch: Matching based Video Object Segmentation , 2018, ECCV.

[146]  Guosheng Lin,et al.  MoNet: Deep Motion Exploitation for Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[147]  Ling Shao,et al.  Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[148]  Parham Aarabi,et al.  SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[149]  Xiaojuan Qi,et al.  Memory Selection Network for Video Propagation , 2020, European Conference on Computer Vision.

[150]  In So Kweon,et al.  Learning to Associate Every Segment for Video Panoptic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[151]  Vladlen Koltun,et al.  Playing for Benchmarks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[152]  Michael Gygli,et al.  Interactive Video Object Segmentation in the Wild , 2017, ArXiv.

[153]  Ling Shao,et al.  Video Salient Object Detection via Fully Convolutional Networks , 2017, IEEE Transactions on Image Processing.

[154]  Yingjie Chen,et al.  SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[155]  Yong Jae Lee,et al.  Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[156]  Alexei A. Efros,et al.  Space-Time Correspondence as a Contrastive Random Walk , 2020, NeurIPS.

[157]  Karteek Alahari,et al.  Learning Video Object Segmentation with Visual Memory , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[158]  Ruigang Yang,et al.  Saliency-Aware Video Object Segmentation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[159]  Ramesh C. Jain,et al.  On the Analysis of Accumulative Difference Pictures from Image Sequences of Real World Scenes , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[160]  Thomas Brox,et al.  Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[161]  Jan Kautz,et al.  Learning to Track Instances without Video Annotations , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[162]  Long Quan,et al.  Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation , 2020, ECCV.

[163]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[164]  Trevor Darrell,et al.  Clockwork Convnets for Video Semantic Segmentation , 2016, ECCV Workshops.

[165]  K.-K. Maninis,et al.  Video Object Segmentation without Temporal Information , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[166]  Jitendra Malik,et al.  Learning to segment moving objects in videos , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[167]  Ming-Hsuan Yang,et al.  SegFlow: Joint Learning for Video Object Segmentation and Optical Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[168]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[169]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[170]  Wei Liu,et al.  MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[171]  Fahad Shahbaz Khan,et al.  SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation , 2020, ECCV.

[172]  Chung-Ching Lin,et al.  Video Instance Segmentation Tracking With a Modified VAE Architecture , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[173]  Wei Liu,et al.  CNN in MRF: Video Object Segmentation via Inference in a CNN-Based Higher-Order Spatio-Temporal MRF , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[174]  Sanyuan Zhao,et al.  Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection , 2018, ECCV.

[175]  Chang-Su Kim,et al.  Online Video Object Segmentation via Convolutional Trident Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[176]  Michael Felsberg,et al.  Learning Fast and Robust Target Models for Video Object Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[177]  David Salesin,et al.  Keyframe-based tracking for rotoscoping and animation , 2004, ACM Trans. Graph..

[178]  Guoqiang Han,et al.  Reciprocal Transformations for Unsupervised Video Object Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[179]  Jiaya Jia,et al.  Video Instance Segmentation with a Propose-Reduce Paradigm , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[180]  L. Jiao,et al.  New Generation Deep Learning for Video Object Detection: A Survey , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[181]  Xuming He,et al.  Multiclass semantic video segmentation with object-level active inference , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[182]  Gang Yu,et al.  State-Aware Tracker for Real-Time Video Object Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[183]  Rong Jin,et al.  Learning Position and Target Consistency for Memory-based Video Object Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[184]  Mubarak Shah,et al.  Visual-Textual Capsule Routing for Text-Based Video Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[185]  Xiaoxiao Li,et al.  Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation , 2018, ECCV.

[186]  Mingjie Sun,et al.  Fast Template Matching and Update for Video Object Tracking and Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[187]  Zhe L. Lin,et al.  Temporally Distributed Networks for Fast Video Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[188]  Karteek Alahari,et al.  Learning to Segment Moving Objects , 2017, International Journal of Computer Vision.

[189]  Bin Zhou,et al.  Primary Video Object Segmentation via Complementary CNNs and Neighborhood Reversible Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[190]  Euntai Kim,et al.  Kernelized Memory Network for Video Object Segmentation , 2020, ECCV.

[191]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[192]  Huchuan Lu,et al.  Learning to Detect Salient Objects with Image-Level Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[193]  Cristian Sminchisescu,et al.  Semantic Video Segmentation by Gated Recurrent Flow Propagation , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[194]  Peng Wang,et al.  Semantic Instance Segmentation via Deep Metric Learning , 2017, ArXiv.

[195]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[196]  Katerina Fragkiadaki,et al.  Video segmentation by tracing discontinuities in a trajectory embedding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[197]  Peter V. Gehler,et al.  Semantic Video CNNs Through Representation Warping , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[198]  Yang Wang,et al.  Referring Segmentation in Images and Videos With Cross-Modal Self-Attention Network , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[199]  Markus H. Gross,et al.  Fully Connected Object Proposals for Video Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[200]  Gérard G. Medioni,et al.  Detecting Motion Regions in the Presence of a Strong Parallax from a Moving Camera by Multiview Geometric Constraints , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[201]  Xin Li,et al.  Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement , 2020, NeurIPS.

[202]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[203]  Yunchao Wei,et al.  Memory Aggregation Networks for Efficient Interactive Video Object Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[204]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[205]  Hao Shen,et al.  Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[206]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[207]  Dahua Lin,et al.  Low-Latency Video Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[208]  Yizhou Yu,et al.  Motion Guided Attention for Video Salient Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[209]  M. Shah,et al.  Object tracking: A survey , 2006, CSUR.

[210]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[211]  Ning Xu,et al.  YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark , 2018, ArXiv.

[212]  Bastian Leibe,et al.  Online Adaptation of Convolutional Neural Networks for Video Object Segmentation , 2017, BMVC.

[213]  Bohyung Han,et al.  URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark , 2020, ECCV.

[214]  René Vidal,et al.  Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees , 2013, 2013 IEEE International Conference on Computer Vision.

[215]  Radomír Mech,et al.  Unsupervised Video Object Segmentation with Joint Hotspot Tracking , 2020, ECCV.

[216]  Qi Tian,et al.  Polar Relative Positional Encoding for Video-Language Segmentation , 2020, IJCAI.

[217]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[218]  Stefano Soatto,et al.  DyStaB: Unsupervised Object Segmentation via Dynamic-Static Bootstrapping* , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[219]  Steven C. H. Hoi,et al.  Paying Attention to Video Object Pattern Understanding , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[220]  Luc Van Gool,et al.  Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[221]  C.-C. Jay Kuo,et al.  Unsupervised Video Object Segmentation with Motion-Based Bilateral Networks , 2018, ECCV.

[222]  Luc Van Gool,et al.  The 2018 DAVIS Challenge on Video Object Segmentation , 2018, ArXiv.

[223]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[224]  Jun-Sik Kim,et al.  Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[225]  Martin Jägersand,et al.  Video Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting , 2018, ArXiv.

[226]  Bernt Schiele,et al.  Video Object Segmentation with Language Referring Expressions , 2018, ACCV.

[227]  Michal Irani,et al.  Video Segmentation by Non-Local Consensus voting , 2014, BMVC.

[228]  Xiaojun Chang,et al.  Reinforcement Cutting-Agent Learning for Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[229]  Andreas Geiger,et al.  MOTS: Multi-Object Tracking and Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[230]  Lars Petersson,et al.  Bringing Background into the Foreground: Making All Classes Equal in Weakly-Supervised Video Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[231]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[232]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[233]  Yuan Xie,et al.  Flow Guided Recurrent Neural Encoder for Video Salient Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[234]  Ling Shao,et al.  See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[235]  Thomas Brox,et al.  Lucid Data Dreaming for Video Object Segmentation , 2017, International Journal of Computer Vision.

[236]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[237]  Sanja Fidler,et al.  ScribbleBox: Interactive Annotation Framework for Video Object Segmentation , 2020, ECCV.

[238]  Kalyan Sunkavalli,et al.  Fast Video Object Segmentation by Reference-Guided Mask Propagation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[239]  Ramakant Nevatia,et al.  TALL: Temporal Activity Localization via Language Query , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[240]  Jian Dong,et al.  Video Scene Parsing with Predictive Feature Learning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[241]  John W. Fisher,et al.  A Video Representation Using Temporal Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[242]  Ming-Hsuan Yang,et al.  JOTS: Joint Online Tracking and Segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[243]  Arnold W. M. Smeulders,et al.  Long-term Tracking in the Wild: A Benchmark , 2018, ECCV.

[244]  Roberto Cipolla,et al.  Learning a Spatio-Temporal Embedding for Video Instance Segmentation , 2019, ArXiv.

[245]  Alexander Sorkine-Hornung,et al.  Bilateral Space Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[246]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[247]  Trevor Darrell,et al.  Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[248]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[249]  Sanja Fidler,et al.  DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[250]  Steven C. H. Hoi,et al.  Learning Video Object Segmentation From Unlabeled Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[251]  Ning Xu,et al.  Deep Interactive Object Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[252]  Sanyuan Zhao,et al.  Learning Unsupervised Video Object Segmentation Through Visual Attention , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[253]  Jiebo Luo,et al.  Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[254]  Chang-Su Kim,et al.  Interactive Video Object Segmentation Using Global and Local Transfer Modules , 2020, ECCV.

[255]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[256]  Joelle Pineau,et al.  Conditional Computation in Neural Networks for faster models , 2015, ArXiv.

[257]  R. Venkatesh Babu,et al.  SeamSeg: Video Object Segmentation Using Patch Seams , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[258]  Chang-Su Kim,et al.  Streaming Video Segmentation via Short-Term Hierarchical Segmentation and Frame-by-Frame Markov Random Field Optimization , 2016, ECCV.

[259]  James T. Tippett,et al.  OPTICAL AND ELECTRO-OPTICAL INFORMATION PROCESSING, , 1965 .