A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects

Recently, Minimum Cost Multicut Formulations have been proposed and proven to be successful in both motion trajectory segmentation and multi-target tracking scenarios. Both tasks benefit from decomposing a graphical model into an optimal number of connected components based on attractive and repulsive pairwise terms. The two tasks are formulated on different levels of granularity and, accordingly, leverage mostly local information for motion segmentation and mostly high-level information for multi-target tracking. In this paper we argue that point trajectories and their local relationships can contribute to the high-level task of multi-target tracking and also argue that high-level cues from object detection and tracking are helpful to solve motion segmentation. We propose a joint graphical model for point trajectories and object detections whose Multicuts are solutions to motion segmentation {\it and} multi-target tracking problems at once. Results on the FBMS59 motion segmentation benchmark as well as on pedestrian tracking sequences from the 2D MOT 2015 benchmark demonstrate the promise of this joint approach.

[1]  Ullrich Köthe,et al.  Globally Optimal Closed-Surface Segmentation for Connectomics , 2012, ECCV.

[2]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[5]  Thomas Brox,et al.  Efficient Decomposition of Image and Mesh Graphs by Lifted Multicuts , 2015, ICCV.

[6]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Thomas Brox,et al.  Motion Trajectory Segmentation via Minimum Cost Multicuts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Ullrich Köthe,et al.  Probabilistic image segmentation with closedness constraints , 2011, 2011 International Conference on Computer Vision.

[9]  Amos Fiat,et al.  Correlation clustering in general weighted graphs , 2006, Theor. Comput. Sci..

[10]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[11]  Gerhard Reinelt,et al.  Globally Optimal Image Partitioning by Multicuts , 2011, EMMCVPR.

[12]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[13]  Hongdong Li,et al.  Robust Motion Segmentation with Unknown Correspondences , 2014, ECCV.

[14]  M. R. Rao,et al.  The partition problem , 1993, Math. Program..

[15]  Wongun Choi,et al.  Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[17]  Anil M. Cheriyadat,et al.  Non-negative matrix factorization of partial track data for motion segmentation , 2010, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Bernt Schiele,et al.  Detection and Tracking of Occluded People , 2014, International Journal of Computer Vision.

[19]  Katerina Fragkiadaki,et al.  Detection free tracking: Exploiting motion and topology for segmenting and tracking under entanglement , 2011, CVPR 2011.

[20]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Bodo Rosenhahn,et al.  Efficient Multiple People Tracking Using Minimum Cost Arborescences , 2014, GCPR.

[22]  Jianbo Shi,et al.  High-for-Low and Low-for-High: Efficient Boundary Detection from Deep Object Features and Its Applications to High-Level Vision , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Ivan Laptev,et al.  Track to the future: Spatio-temporal video segmentation with long-range motion cues , 2011, CVPR 2011.

[24]  Luc Van Gool,et al.  Motion Segmentation with Weak Labeling Priors , 2014, GCPR.

[25]  John W. Fisher,et al.  A Video Representation Using Temporal Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[27]  Daniel Cremers,et al.  Shape priors in variational image segmentation: Convexity, Lipschitz continuity and globally optimal solutions , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Bastian Leibe,et al.  Level-set person segmentation and tracking with multi-region appearance models and top-down shape information , 2011, 2011 International Conference on Computer Vision.

[29]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[31]  Ehsan Elhamifar,et al.  Sparse subspace clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Jörg H. Kappes,et al.  Fusion moves for correlation clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Pascal Fua,et al.  Tracking multiple people under global appearance constraints , 2011, 2011 International Conference on Computer Vision.

[34]  Thomas Brox,et al.  Higher order motion models and spectral clustering , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Bernt Schiele,et al.  Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes , 2010, ECCV.

[36]  Ullrich Köthe,et al.  Cut, Glue, & Cut: A Fast, Approximate Solver for Multicut Partitioning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Jitendra Malik,et al.  Learning to segment moving objects in videos , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[39]  Xiaogang Wang,et al.  Unsupervised Salience Learning for Person Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Trevor Darrell,et al.  LSDA: Large Scale Detection through Adaptation , 2014, NIPS.

[41]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[42]  Michael Felsberg,et al.  Fast Segmentation of Sparse 3D Point Trajectories Using Group Theoretical Invariants , 2014, ACCV.

[43]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[44]  Ian D. Reid,et al.  Latent Data Association: Bayesian Model Selection for Multi-target Tracking , 2013, 2013 IEEE International Conference on Computer Vision.

[45]  Katerina Fragkiadaki,et al.  Two-Granularity Tracking: Mediating Trajectory and Detection Graphs for Tracking under Occlusions , 2012, ECCV.

[46]  Afshin Dehghan,et al.  GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[47]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[49]  Pascal Fua,et al.  Tracking Interacting Objects Optimally Using Integer Programming , 2014, ECCV.

[50]  Zhuwen Li,et al.  Perspective Motion Segmentation via Collaborative Clustering , 2013, 2013 IEEE International Conference on Computer Vision.

[51]  Konrad Schindler,et al.  Discrete-continuous optimization for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Wei Wu,et al.  Robust Trajectory Clustering for Motion Segmentation , 2013, 2013 IEEE International Conference on Computer Vision.

[53]  Bernt Schiele,et al.  Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Ian D. Reid,et al.  Joint tracking and segmentation of multiple targets , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[56]  Bodo Rosenhahn,et al.  Multi-scale Clustering of Frame-to-Frame Correspondences for Motion Segmentation , 2012, ECCV.