Combining Self Training and Active Learning for Video Segmentation

This work addresses the problem of segmenting an object of interest out of a video. We show that video object segmentation can be naturally cast as a semi-supervised learning problem and be efficiently solved using harmonic functions. We propose an incremental self-training approach by iteratively labeling the least uncertain frame and updating similarity metrics. Our self-training video segmentation produces superior results both qualitatively and quantitatively. Moreover, usage of harmonic functions naturally supports interactive segmentation. We suggest active learning methods for providing guidance to user on what to annotate in order to improve labeling efficiency. We present experimental results using a ground truth data set and a quantitative comparison to a representative object segmentation system.

[1]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[2]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[3]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Roberto Cipolla,et al.  Label propagation in video sequences , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Trevor Darrell,et al.  Active Learning with Gaussian Processes for Object Categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Horst Bischof,et al.  On-line semi-supervised multiple-instance boosting , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  James M. Rehg,et al.  Motion Coherent Tracking with Multi-label MRF optimization , 2010, BMVC.

[8]  Jitendra Malik,et al.  Tracking as Repeated Figure/Ground Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Andrew Zisserman,et al.  A Statistical Approach to Texture Classification from Single Images , 2004, International Journal of Computer Vision.

[10]  Ian D. Reid,et al.  Robust Real-Time Visual Tracking Using Pixel-Wise Posteriors , 2008, ECCV.

[11]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jiebo Luo,et al.  iCoseg: Interactive co-segmentation with intelligent scribble guidance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Yanxi Liu,et al.  Online Selection of Discriminative Tracking Features , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Shai Avidan,et al.  Ensemble Tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Pushmeet Kohli,et al.  Measuring uncertainty in graph cut solutions , 2008, Comput. Vis. Image Underst..

[16]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[17]  Jitendra Malik,et al.  From contours to regions: An empirical evaluation , 2009, CVPR.

[18]  Helen C. Shen,et al.  Semi-Supervised Classification Using Linear Neighborhood Propagation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Antonio Torralba,et al.  LabelMe video: Building a video database with human annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, ACM Trans. Graph..

[21]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[23]  Stanley T. Birchfield,et al.  Adaptive fragments-based tracking of non-rigid objects using level sets , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[25]  Scott Cohen,et al.  LIVEcut: Learning-based interactive video segmentation by evaluation of multiple propagated cues , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Ignas Budvytis,et al.  Label propagation in complex video sequences using semi-supervised learning , 2010, BMVC.

[27]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[28]  Joachim M. Buhmann,et al.  Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[30]  Maneesh Agrawala,et al.  Interactive video cutout , 2005, SIGGRAPH 2005.

[31]  R. Collins,et al.  On-line selection of discriminative tracking features , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.