Semi-supervised Domain Adaptation for Weakly Labeled Semantic Video Object Segmentation

Deep convolutional neural networks (CNNs) have been immensely successful in many high-level computer vision tasks given large labelled datasets. However, for video semantic object segmentation, a domain where labels are scarce, effectively exploiting the representation power of CNN with limited training data remains a challenge. Simply borrowing the existing pre-trained CNN image recognition model for video segmentation task can severely hurt performance. We propose a semi-supervised approach to adapting CNN image recognition model trained from labelled image data to the target domain exploiting both semantic evidence learned from CNN, and the intrinsic structures of video data. By explicitly modelling and compensating for the domain shift from the source domain to the target domain, this proposed approach underpins a robust semantic object segmentation method against the changes in appearance, shape and occlusion in natural videos. We present extensive experiments on challenging datasets that demonstrate the superior performance of our approach compared with the state-of-the-art methods.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[3]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[4]  Tinghuai Wang,et al.  Graph Transduction Learning of Object Proposals for Video Object Segmentation , 2014, ACCV.

[5]  Jitendra Malik,et al.  Occlusion boundary detection and figure/ground assignment from optical flow , 2011, CVPR 2011.

[6]  Kristen Grauman,et al.  Supervoxel-Consistent Foreground Propagation in Video , 2014, ECCV.

[7]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[8]  Bo Han,et al.  TouchCut: Fast image and video segmentation using single-touch interaction , 2014, Comput. Vis. Image Underst..

[9]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Santiago Manen,et al.  Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Xiao Liu,et al.  Weakly Supervised Multiclass Video Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Harry Shum,et al.  Video tooning , 2004, ACM Trans. Graph..

[15]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[16]  John P. Collomosse,et al.  Stroke surfaces: temporally coherent artistic animations from video , 2005, IEEE Transactions on Visualization and Computer Graphics.

[17]  Michal Irani,et al.  Video Segmentation by Non-Local Consensus voting , 2014, BMVC.

[18]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[19]  James M. Rehg,et al.  Weakly Supervised Learning of Object Segmentations from Web-Scale Video , 2012, ECCV Workshops.

[20]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Francesca Murabito,et al.  Superpixel-based video object segmentation using perceptual organization and location prior , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[23]  Fatih Murat Porikli,et al.  Saliency-aware geodesic video object segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Tinghuai Wang,et al.  Primary object discovery and segmentation in videos via graph-based transductive inference , 2016, Comput. Vis. Image Underst..

[26]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[27]  Fei-Fei Li,et al.  Discriminative Segment Annotation in Weakly Labeled Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Jitendra Malik,et al.  From contours to regions: An empirical evaluation , 2009, CVPR.

[30]  Nanning Zheng,et al.  Video Object Discovery and Co-Segmentation with Extremely Weak Supervision , 2017, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Markus H. Gross,et al.  Fully Connected Object Proposals for Video Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Chen Wang,et al.  Semantic object segmentation via detection in weakly labeled video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Chenliang Xu,et al.  Streaming Hierarchical Video Segmentation , 2012, ECCV.

[35]  Alon Faktor,et al.  Video Object Segmentation by Non-Local Consensus voting , 2014, BMVC 2014.

[36]  James M. Rehg,et al.  Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Tinghuai Wang,et al.  Probabilistic Motion Diffusion of Labeling Priors for Coherent Video Segmentation , 2012, IEEE Transactions on Multimedia.

[39]  Brian Taylor,et al.  Causal video object segmentation from persistence of occlusions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Nikos Paragios,et al.  Segmentation, ordering and multi-object tracking using graphical models , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Atsushi Nakazawa,et al.  Motion Coherent Tracking Using Multi-label MRF Optimization , 2012, International Journal of Computer Vision.

[42]  Junsong Yuan,et al.  Discovering Primary Objects in Videos by Saliency Fusion and Iterative Appearance Estimation , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[43]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.