Object Discovery in Videos as Foreground Motion Clustering

We consider the problem of providing dense segmentation masks for object discovery in videos. We formulate the object discovery problem as foreground motion clustering, where the goal is to cluster foreground pixels in videos into different objects. We introduce a novel pixel-trajectory recurrent neural network that learns feature embeddings of foreground pixel trajectories linked across time. By clustering the pixel trajectories using the learned feature embeddings, our method establishes correspondences between foreground object masks across video frames. To demonstrate the effectiveness of our framework for object discovery, we conduct experiments on commonly used datasets for motion segmentation, where we achieve state-of-the-art performance.

[1]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[2]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[4]  Jie Yang,et al.  Flexible background mixture models for foreground segmentation , 2006, Image Vis. Comput..

[5]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[7]  Takumi Kobayashi,et al.  Von Mises-Fisher Mean Shift for Clustering on a Hypersphere , 2010, 2010 20th International Conference on Pattern Recognition.

[8]  Kurt Keutzer,et al.  Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow , 2010, ECCV.

[9]  Xiaoqin Zhang,et al.  Incremental Tensor Subspace Learning and Its Applications to Foreground Segmentation and Tracking , 2011, International Journal of Computer Vision.

[10]  Ming-Hsuan Yang,et al.  Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Pascal Fua,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[12]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[13]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[14]  Katerina Fragkiadaki,et al.  Video segmentation by tracing discontinuities in a trajectory embedding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Allen R. Hanson,et al.  Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[18]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[19]  Michal Irani,et al.  Video Segmentation by Non-Local Consensus voting , 2014, BMVC.

[20]  Thomas Brox,et al.  Motion Trajectory Segmentation via Minimum Cost Multicuts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Silvio Savarese,et al.  Learning to Track: Online Multi-object Tracking by Decision Making , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Brian Taylor,et al.  Causal video object segmentation from persistence of occlusions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[24]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[25]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[28]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Erik G. Learned-Miller,et al.  A Detailed Rubric for Motion Segmentation , 2016, ArXiv.

[31]  Erik G. Learned-Miller,et al.  It's Moving! A Probabilistic Model for Causal Motion Segmentation in Moving Camera Videos , 2016, ECCV.

[32]  Christopher Joseph Pal,et al.  Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.

[33]  Kristen Grauman,et al.  FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Luc Van Gool,et al.  Semantic Instance Segmentation with a Discriminative Loss Function , 2017, ArXiv.

[35]  Oliver Brock,et al.  Interactive Perception: Leveraging Action in Perception and Perception in Action , 2016, IEEE Transactions on Robotics.

[36]  Karteek Alahari,et al.  Learning Motion Patterns in Videos , 2016, CVPR.

[37]  Trevor Darrell,et al.  Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Margret Keuper,et al.  Higher-Order Minimum Cost Lifted Multicuts for Motion Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[40]  Wenguan Wang,et al.  Super-Trajectory for Video Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Dieter Fox,et al.  DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks , 2017, Robotics: Science and Systems.

[42]  Karteek Alahari,et al.  Learning Video Object Segmentation with Visual Memory , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Peng Wang,et al.  Semantic Instance Segmentation via Deep Metric Learning , 2017, ArXiv.

[44]  Julian Straub,et al.  Nonparametric directional perception , 2017 .

[45]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Erik G. Learned-Miller,et al.  The Best of Both Worlds: Combining CNNs and Geometric Constraints for Hierarchical Motion Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Shu Kong,et al.  Recurrent Pixel Embedding for Instance Grouping , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Andrea Vedaldi,et al.  Semi-convolutional Operators for Instance Segmentation , 2018, ECCV.

[49]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[50]  Jeannette Bohg,et al.  Motion-Based Object Segmentation Based on Dense RGB-D Scene Flow , 2018, IEEE Robotics and Automation Letters.

[51]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[52]  Thomas Brox,et al.  Motion Segmentation & Multiple Object Tracking by Correlation Co-Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.