Unsupervised Video Object Segmentation using Motion Saliency-Guided Spatio-Temporal Propagation

Unsupervised video segmentation plays an important role in a wide variety of applications from object identification to compression. However, to date, fast motion, motion blur and occlusions pose significant challenges. To address these challenges for unsupervised video segmentation, we develop a novel saliency estimation technique as well as a novel neighborhood graph, based on optical flow and edge cues. Our approach leads to significantly better initial foreground-background estimates and their robust as well as accurate diffusion across time. We evaluate our proposed algorithm on the challenging DAVIS, SegTrack v2 and FBMS-59 datasets. Despite the usage of only a standard edge detector trained on 200 images, our method achieves state-of-the-art results outperforming deep learning based methods in the unsupervised setting. We even demonstrate competitive results comparable to deep learning based methods in the semi-supervised setting on the DAVIS dataset.

[1]  Jan-Olof Eklundh,et al.  Statistical background subtraction for a mobile observer , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Karteek Alahari,et al.  Learning Motion Patterns in Videos , 2016, CVPR.

[3]  Katerina Fragkiadaki,et al.  Video segmentation by tracing discontinuities in a trajectory embedding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yunsong Li,et al.  Efficient Coarse-to-Fine Patch Match for Large Displacement Optical Flow , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Michal Irani,et al.  Computing occluding and transparent motions , 1994, International Journal of Computer Vision.

[6]  Bernt Schiele,et al.  Learning Video Object Segmentation from Static Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Karteek Alahari,et al.  Learning Video Object Segmentation with Visual Memory , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Ivan Laptev,et al.  Track to the future: Spatio-temporal video segmentation with long-range motion cues , 2011, CVPR 2011.

[9]  Alexander G. Schwing,et al.  VideoMatch: Matching based Video Object Segmentation , 2018, ECCV.

[10]  Chang-Su Kim,et al.  Primary Object Segmentation in Videos via Alternate Convex Optimization of Foreground and Background Distributions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Tinghuai Wang,et al.  Probabilistic Motion Diffusion of Labeling Priors for Coherent Video Segmentation , 2012, IEEE Transactions on Multimedia.

[12]  Wenbin Li,et al.  Roto++ , 2016, ACM Trans. Graph..

[13]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[14]  Dani Lischinski,et al.  JumpCut , 2015, ACM Trans. Graph..

[15]  Takeo Kanade,et al.  A multi-body factorization method for motion analysis , 1995, Proceedings of IEEE International Conference on Computer Vision.

[16]  Thomas Brox,et al.  Video Segmentation with Just a Few Strokes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[18]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[19]  Shao-Yi Chien,et al.  Real-Time Salient Object Detection with a Minimum Spanning Tree , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Chang-Su Kim,et al.  Online Video Object Segmentation via Convolutional Trident Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Roberto Cipolla,et al.  Label propagation in video sequences , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[23]  Scott Cohen,et al.  LIVEcut: Learning-based interactive video segmentation by evaluation of multiple propagated cues , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[25]  Thomas Brox,et al.  Motion Trajectory Segmentation via Minimum Cost Multicuts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  P. Anandan,et al.  A unified approach to moving object detection in 2D and 3D scenes , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[28]  Peter V. Gehler,et al.  Video Propagation Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Kristen Grauman,et al.  Supervoxel-Consistent Foreground Propagation in Video , 2014, ECCV.

[30]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[31]  Alexander Sorkine-Hornung,et al.  Bilateral Space Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yong Jae Lee,et al.  Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Kristen Grauman,et al.  FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  William Brendel,et al.  Video object segmentation by tracking regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[35]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[36]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  R. Venkatesh Babu,et al.  SeamSeg: Video Object Segmentation Using Patch Seams , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Michael J. Black,et al.  Video Segmentation via Object Flow , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  James M. Rehg,et al.  Motion Coherent Tracking with Multi-label MRF optimization , 2010, BMVC.

[40]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  A. Criminisi,et al.  Bilayer Segmentation of Live Video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Michal Irani,et al.  Video Segmentation by Non-Local Consensus voting , 2014, BMVC.

[43]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[44]  Ehsan Elhamifar,et al.  Sparse subspace clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Gérard G. Medioni,et al.  Detecting Motion Regions in the Presence of a Strong Parallax from a Moving Camera by Multiview Geometric Constraints , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Andrew Zisserman,et al.  Concerning Bayesian Motion Segmentation, Model, Averaging, Matching and the Trifocal Tensor , 1998, ECCV.

[47]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  L. Davis,et al.  Background and foreground modeling using nonparametric kernel density estimation for visual surveillance , 2002, Proc. IEEE.

[49]  Jian Sun,et al.  Geodesic Saliency Using Background Priors , 2012, ECCV.

[50]  Kristen Grauman,et al.  Active Frame Selection for Label Propagation in Videos , 2012, ECCV.

[51]  Alexander G. Schwing,et al.  MaskRNN: Instance Level Video Object Segmentation , 2018, NIPS.

[52]  Chin-Seng Chua,et al.  Statistical background modeling for non-stationary camera , 2003, Pattern Recognit. Lett..

[53]  Thomas Brox,et al.  A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[54]  Chang-Su Kim,et al.  Primary Object Segmentation in Videos Based on Region Augmentation and Reduction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Markus H. Gross,et al.  Fully Connected Object Proposals for Video Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[56]  René Vidal,et al.  Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Alon Faktor,et al.  Video Object Segmentation by Non-Local Consensus voting , 2014, BMVC 2014.

[58]  James M. Rehg,et al.  Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[59]  Benjamin Höferlin,et al.  Evaluation of background subtraction techniques for video surveillance , 2011, CVPR 2011.

[60]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[61]  Narendra Ahuja,et al.  Exploiting nonlocal spatiotemporal structure for video segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.