Motion-Guided Cascaded Refinement Network for Video Object Segmentation

In this work, we propose a motion-guided cascaded refinement network for video object segmentation. By assuming the foreground objects show different motion patterns from the background, for each video frame we apply an active contour model on optical flow to coarsely segment the foreground. The proposed Cascaded Refinement Network (CRN) then takes as guidance the coarse segmentation to generate an accurate segmentation in full resolution. In this way, the motion information and the deep CNNs can complement each other well to accurately segment the foreground objects from video frames. To deal with multi-instance cases, we extend our method with a spatial-temporal instance embedding model that further segments the foreground regions into instances and propagates instance labels. We further introduce a single-channel residual attention module in CRN to incorporate the coarse segmentation map as attention, which makes the network effective and efficient in both training and testing. We perform experiments on popular benchmarks and the results show that our method achieves state-of-the-art performance with high time efficiency.

[1]  Anthony J. Yezzi,et al.  Curve evolution implementation of the Mumford-Shah functional for image segmentation, denoising, interpolation, and magnification , 2001, IEEE Trans. Image Process..

[2]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Ming-Hsuan Yang,et al.  JOTS: Joint Online Tracking and Segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Thomas Brox,et al.  Lucid Data Dreaming for Object Tracking , 2017, ArXiv.

[5]  Jitendra Malik,et al.  Learning to segment moving objects in videos , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Markus H. Gross,et al.  Fully Connected Object Proposals for Video Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  R. Venkatesh Babu,et al.  SeamSeg: Video Object Segmentation Using Patch Seams , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Stefan Duffner,et al.  PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects , 2013, ICCV.

[10]  Michael J. Black,et al.  Video Segmentation via Object Flow , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Rachid Deriche,et al.  A Review of Statistical Approaches to Level Set Segmentation: Integrating Color, Texture, Motion and Shape , 2007, International Journal of Computer Vision.

[12]  Longin Jan Latecki,et al.  Maximum weight cliques with mutex constraints for video object segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[15]  Shengping Zhang,et al.  Non-Rigid Object Contour Tracking via a Novel Supervised Level Set Model , 2015, IEEE Transactions on Image Processing.

[16]  Fatih Murat Porikli,et al.  Saliency-aware geodesic video object segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Bastian Leibe,et al.  Online Adaptation of Convolutional Neural Networks for Video Object Segmentation , 2017, BMVC.

[18]  Bastian Leibe,et al.  Level-set person segmentation and tracking with multi-region appearance models and top-down shape information , 2011, 2011 International Conference on Computer Vision.

[19]  Tony F. Chan,et al.  Active contours without edges , 2001, IEEE Trans. Image Process..

[20]  Jitendra Malik,et al.  Motion segmentation and tracking using normalized cuts , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[21]  Qin Huang,et al.  Instance Embedding Transfer to Unsupervised Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Bernt Schiele,et al.  Learning Video Object Segmentation from Static Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Bin Zhou,et al.  Primary Video Object Segmentation via Complementary CNNs and Neighborhood Reversible Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Karteek Alahari,et al.  Learning Video Object Segmentation with Visual Memory , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Karteek Alahari,et al.  Learning Motion Patterns in Videos , 2016, CVPR.

[26]  Xuelong Li,et al.  Improving Level Set Method for Fast Auroral Oval Segmentation , 2014, IEEE Transactions on Image Processing.

[27]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Nahum Kiryati,et al.  Shape-Based Mutual Segmentation , 2008, International Journal of Computer Vision.

[29]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[31]  Katerina Fragkiadaki,et al.  Video segmentation by tracing discontinuities in a trajectory embedding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Kristen Grauman,et al.  Supervoxel-Consistent Foreground Propagation in Video , 2014, ECCV.

[33]  Ming-Hsuan Yang,et al.  Fast and Accurate Online Video Object Segmentation via Tracking Parts , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Peter V. Gehler,et al.  Video Propagation Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[36]  Kristen Grauman,et al.  FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jun-Sik Kim,et al.  Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Wenguan Wang,et al.  Super-Trajectory for Video Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Chang-Su Kim,et al.  Online Video Object Segmentation via Convolutional Trident Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Huchuan Lu,et al.  Superpixel tracking , 2011, 2011 International Conference on Computer Vision.

[41]  Aggelos K. Katsaggelos,et al.  Efficient Video Object Segmentation via Network Modulation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Michal Irani,et al.  Video Segmentation by Non-Local Consensus voting , 2014, BMVC.

[43]  Gang Wang,et al.  Motion-Guided Cascaded Refinement Network for Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[45]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Luc Van Gool,et al.  Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Chunming Li,et al.  Level set evolution without re-initialization: a new variational formulation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[48]  Wei Liu,et al.  CNN in MRF: Video Object Segmentation via Inference in a CNN-Based Higher-Order Spatio-Temporal MRF , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Chang-Su Kim,et al.  Primary Object Segmentation in Videos Based on Region Augmentation and Reduction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[52]  Bernt Schiele,et al.  Video Segmentation with Superpixels , 2012, ACCV.

[53]  Yong Jae Lee,et al.  Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Thomas Brox,et al.  Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[56]  Alexander Sorkine-Hornung,et al.  Bilateral Space Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Gang Wang,et al.  Deep Level Sets for Salient Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  James S. Duncan,et al.  Active Contours with Group Similarity , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  K.-K. Maninis,et al.  Video Object Segmentation without Temporal Information , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Guillermo Sapiro,et al.  Geodesic Active Contours , 1995, International Journal of Computer Vision.

[61]  W. Clem Karl,et al.  Real-time tracking using level sets , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[62]  Ming-Hsuan Yang,et al.  SegFlow: Joint Learning for Video Object Segmentation and Optical Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[63]  Tony F. Chan,et al.  Active Contours without Edges for Vector-Valued Images , 2000, J. Vis. Commun. Image Represent..

[64]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.