Contextualized trajectory parsing with spatiotemporal graph.

This work investigates how to automatically parse object trajectories in surveillance videos, which aims at jointly solving three subproblems: 1) spatial segmentation, 2) temporal tracking, and 3) object categorization. We present a novel representation spatiotemporal graph (ST-Graph) in which: 1) Graph nodes express the motion primitives, each representing a short sequence of small-size patches over consecutive images, and 2) every two neighbor nodes are linked with either a positive edge or a negative edge to describe their collaborative or exclusive relationship of belonging to the same object trajectory. Phrasing the trajectory parsing as a graph multicoloring problem, we propose a unified probabilistic formulation to integrate various types of context knowledge as informative priors. An efficient composite cluster sampling algorithm is employed in search of the optimal solution by exploiting both the collaborative and the exclusive relationships between nodes. The proposed framework is evaluated over challenging videos from public datasets, and results show that it can achieve state-of-the-art tracking accuracy.

[1]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[2]  Ramakant Nevatia,et al.  Segmentation and Tracking of Multiple Humans in Crowded Environments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  W. Eric L. Grimson,et al.  Learning Semantic Scene Models by Trajectory Analysis , 2006, ECCV.

[5]  Gérard G. Medioni,et al.  Multiple Target Tracking Using Spatio-Temporal Markov Chain Monte Carlo Data Association , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Ramakant Nevatia,et al.  Multi-target tracking by online learning of non-linear motion patterns and robust appearance models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[8]  Michael F. Cohen,et al.  Monocular Video Foreground/Background Segmentation by Tracking Spatial-Color Gaussian Mixture Models , 2007, 2007 IEEE Workshop on Motion and Video Computing (WMVC'07).

[9]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[10]  Liang Lin,et al.  Layered graph matching by composite cluster sampling with collaborative and competitive interactions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Frank Dellaert,et al.  MCMC-based particle filtering for tracking a variable number of interacting targets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Yongtian Wang,et al.  An integrated background model for video surveillance based on primal sketch and 3D scene geometry , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[14]  Wang,et al.  Nonuniversal critical dynamics in Monte Carlo simulations. , 1987, Physical review letters.

[15]  Robert B. Fisher,et al.  The PETS04 Surveillance Ground-Truth Data Sets , 2004 .

[16]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[17]  Ramakant Nevatia,et al.  Learning affinities and dependencies for multi-target tracking using a CRF model , 2011, CVPR 2011.

[18]  Margrit Betke,et al.  Efficient track linking methods for track graphs using network-flow and set-cover techniques , 2011, CVPR 2011.

[19]  Song-Chun Zhu,et al.  Generalizing Swendsen–Wang for Image Analysis , 2007, Journal of Computational and Graphical Statistics.

[20]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[21]  Shai Avidan,et al.  Ensemble Tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  A. Sokal,et al.  Generalization of the Fortuin-Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm. , 1988, Physical review. D, Particles and fields.

[23]  Stochastic Relaxation , 2014, Computer Vision, A Reference Guide.

[24]  Pedro F. Felzenszwalb,et al.  Efficient belief propagation for early vision , 2004, CVPR 2004.

[25]  Hai Jin,et al.  Segment an image by looking into an image corpus , 2011, CVPR 2011.

[26]  Ramakant Nevatia,et al.  How does person identity recognition help multi-person tracking? , 2011, CVPR 2011.

[27]  Ramakant Nevatia,et al.  Multi-target tracking by on-line learned discriminative appearance models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Hassan Foroosh,et al.  Trajectory Rectification and Path Modeling for Video Surveillance , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Bi Song,et al.  A Stochastic Graph Evolution Framework for Robust Multi-target Tracking , 2010, ECCV.

[31]  R. Collins,et al.  On-line selection of discriminative tracking features , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[32]  Mubarak Shah,et al.  Content based video matching using spatiotemporal volumes , 2008, Comput. Vis. Image Underst..

[33]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[34]  Gregory D. Hager,et al.  Joint probabilistic techniques for tracking multi-part objects , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[35]  Konrad Schindler,et al.  Multi-target tracking by continuous energy minimization , 2011, CVPR 2011.

[36]  Song-Chun Zhu,et al.  C^4: Exploring Multiple Solutions in Graphical Models by Cluster Sampling , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Ramakant Nevatia,et al.  An online learned CRF model for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Patrick Pérez,et al.  Track and Cut: Simultaneous Tracking and Segmentation of Multiple Objects with Graph Cuts , 2008, EURASIP J. Image Video Process..

[39]  Song-Chun Zhu,et al.  From Information Scaling of Natural Images to Regimes of Statistical Models , 2007 .

[40]  Ammad Ali,et al.  Face Recognition with Local Binary Patterns , 2012 .

[41]  Michael G. Strintzis,et al.  Spatiotemporal segmentation and tracking of objects for visualization of videoconference image sequences , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[42]  Liang Lin,et al.  Trajectory parsing by cluster sampling in spatio-temporal graph , 2009, CVPR.

[43]  Ming Yang,et al.  Spatial selection for attentional visual tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Shrinivas J. Pundlik,et al.  Joint tracking of features and edges , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.