Motion Segmentation & Multiple Object Tracking by Correlation Co-Clustering

Models for computer vision are commonly defined either w.r.t. low-level concepts such as pixels that are to be grouped, or w.r.t. high-level concepts such as semantic objects that are to be detected and tracked. Combining bottom-up grouping with top-down detection and tracking, although highly desirable, is a challenging problem. We state this joint problem as a co-clustering problem that is principled and tractable by existing algorithms. We demonstrate the effectiveness of this approach by combining bottom-up motion segmentation by grouping of point trajectories with high-level multiple object tracking by clustering of bounding boxes. We show that solving the joint problem is beneficial at the low-level, in terms of the FBMS59 motion segmentation benchmark, and at the high-level, in terms of the Multiple Object Tracking benchmarks MOT15, MOT16, and the MOT17 challenge, and is state-of-the-art in some metrics.

[1]  Jitendra Malik,et al.  Learning to segment moving objects in videos , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[3]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[5]  Thomas Brox,et al.  Efficient Decomposition of Image and Mesh Graphs by Lifted Multicuts , 2015, ICCV.

[6]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Fan Yang,et al.  Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[9]  Long Chen,et al.  Online multi-object tracking with convolutional neural networks , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[10]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[11]  Joachim Weickert,et al.  Robust Interactive Multi-label Segmentation with an Advanced Edge Detector , 2016, GCPR.

[12]  Pascal Fua,et al.  Tracking Interacting Objects Optimally Using Integer Programming , 2014, ECCV.

[13]  Erik G. Learned-Miller,et al.  It's Moving! A Probabilistic Model for Causal Motion Segmentation in Moving Camera Videos , 2016, ECCV.

[14]  Patrick Bouthemy,et al.  Discovering motion hierarchies via tree-structured coding of trajectories , 2016, BMVC.

[15]  Bernt Schiele,et al.  Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes , 2010, ECCV.

[16]  Thomas Brox,et al.  Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[17]  Ullrich Köthe,et al.  Cut, Glue, & Cut: A Fast, Approximate Solver for Multicut Partitioning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Wongun Choi,et al.  Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[20]  Jianbo Shi,et al.  Understanding popout through repulsion , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21]  Bastian Leibe,et al.  Level-set person segmentation and tracking with multi-region appearance models and top-down shape information , 2011, 2011 International Conference on Computer Vision.

[22]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[24]  Poka-Yio Cut , 2015, Definitions.

[25]  Julian Yarkony,et al.  Fast Planar Correlation Clustering for Image Segmentation , 2012, ECCV.

[26]  Afshin Dehghan,et al.  GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[27]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Bernt Schiele,et al.  Multi-person Tracking by Multicut and Deep Matching , 2016, ECCV Workshops.

[29]  Ullrich Köthe,et al.  An Efficient Fusion Move Algorithm for the Minimum Cost Lifted Multicut Problem , 2016, ECCV.

[30]  Katerina Fragkiadaki,et al.  Detection free tracking: Exploiting motion and topology for segmenting and tracking under entanglement , 2011, CVPR 2011.

[31]  Bodo Rosenhahn,et al.  Efficient Multiple People Tracking Using Minimum Cost Arborescences , 2014, GCPR.

[32]  Luc Van Gool,et al.  Motion Segmentation with Weak Labeling Priors , 2014, GCPR.

[33]  John W. Fisher,et al.  A Video Representation Using Temporal Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Jianbo Shi,et al.  High-for-Low and Low-for-High: Efficient Boundary Detection from Deep Object Features and Its Applications to High-Level Vision , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[36]  Daniel Cremers,et al.  Shape priors in variational image segmentation: Convexity, Lipschitz continuity and globally optimal solutions , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Ivan Laptev,et al.  Track to the future: Spatio-temporal video segmentation with long-range motion cues , 2011, CVPR 2011.

[38]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[39]  Hongdong Li,et al.  Robust Motion Segmentation with Unknown Correspondences , 2014, ECCV.

[40]  M. R. Rao,et al.  The partition problem , 1993, Math. Program..

[41]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  René Vidal,et al.  Sparse subspace clustering , 2009, CVPR.

[43]  Zhuwen Li,et al.  Perspective Motion Segmentation via Collaborative Clustering , 2013, 2013 IEEE International Conference on Computer Vision.

[44]  Konrad Schindler,et al.  Discrete-continuous optimization for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[46]  Bernt Schiele,et al.  Subgraph decomposition for multi-target tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Bodo Rosenhahn,et al.  Tracking with multi-level features , 2016, ArXiv.

[48]  Bernt Schiele,et al.  Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Thomas Brox,et al.  Motion Trajectory Segmentation via Minimum Cost Multicuts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Ullrich Köthe,et al.  Probabilistic image segmentation with closedness constraints , 2011, 2011 International Conference on Computer Vision.

[51]  Amos Fiat,et al.  Correlation clustering in general weighted graphs , 2006, Theor. Comput. Sci..

[52]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[53]  Thomas Brox,et al.  Joint Graph Decomposition & Node Labeling: Problem, Algorithms, Applications , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Ian D. Reid,et al.  Stable multi-target tracking in real-time surveillance video , 2011, CVPR 2011.

[55]  Ian D. Reid,et al.  Joint tracking and segmentation of multiple targets , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[57]  Guillaume Charpiat,et al.  Multiple Object Tracking by Efficient Graph Partitioning , 2014, ACCV.

[58]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Gerhard Reinelt,et al.  Higher-order segmentation via multicuts , 2013, Comput. Vis. Image Underst..

[60]  Anil M. Cheriyadat,et al.  Non-negative matrix factorization of partial track data for motion segmentation , 2010, 2009 IEEE 12th International Conference on Computer Vision.

[61]  Bodo Rosenhahn,et al.  Improvements to Frank-Wolfe optimization for multi-detector multi-object tracking , 2017, ArXiv.

[62]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[63]  Margret Keuper,et al.  Higher-Order Minimum Cost Lifted Multicuts for Motion Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[64]  Bernt Schiele,et al.  Detection and Tracking of Occluded People , 2014, International Journal of Computer Vision.

[65]  Pascal Fua,et al.  Tracking multiple people under global appearance constraints , 2011, 2011 International Conference on Computer Vision.

[66]  Thomas Brox,et al.  Higher order motion models and spectral clustering , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Ullrich Köthe,et al.  Globally Optimal Closed-Surface Segmentation for Connectomics , 2012, ECCV.

[68]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  James M. Rehg,et al.  Multiple Hypothesis Tracking Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[71]  Marcello Pelillo,et al.  Multi-object tracking using dominant sets , 2016, IET Comput. Vis..

[72]  Wei Wu,et al.  Robust Trajectory Clustering for Motion Segmentation , 2013, 2013 IEEE International Conference on Computer Vision.

[73]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Michael Felsberg,et al.  Fast Segmentation of Sparse 3D Point Trajectories Using Group Theoretical Invariants , 2014, ACCV.

[75]  Ivan Laptev,et al.  On pairwise costs for network flow multi-object tracking , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[77]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78]  Sebastian Nowozin,et al.  Image Segmentation UsingHigher-Order Correlation Clustering , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Katerina Fragkiadaki,et al.  Two-Granularity Tracking: Mediating Trajectory and Detection Graphs for Tracking under Occlusions , 2012, ECCV.

[80]  Xiaogang Wang,et al.  Unsupervised Salience Learning for Person Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Bjoern Andres,et al.  A Message Passing Algorithm for the Minimum Cost Multicut Problem , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).