Trajectory parsing by cluster sampling in spatio-temporal graph

The objective of this paper is to parse object trajectories in surveillance video against occlusion, interruption, and background clutter. We present a spatio-temporal graph (ST-Graph) representation and a cluster sampling algorithm via deferred inference. An object trajectory in the ST-Graph is represented by a bundle of “motion primitives”, each of which consists of a small number of matched features (interesting patches) generated by adaptive feature pursuit and a tracking process. Each motion primitive is a graph vertex and has six bonds connecting to neighboring vertices. Based on the ST-Graph, we jointly solve three tasks: 1) spatial segmentation; 2) temporal correspondence and 3) object recognition, by flipping the labels of the motion primitives. We also adapt the scene geometric and statistical information as strong prior. Then the inference computation is formulated in a Markov chain and solved by an efficient cluster sampling. We apply the proposed approach to various challenging videos from a number of public datasets and show it outperform other state of the art methods.

[1]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Irene Cheng,et al.  Tracking shape change using a 3D skeleton hierarchy , 2006, SIGGRAPH '06.

[3]  L. Davis,et al.  M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene , 2003, International Journal of Computer Vision.

[4]  Cor J. Veenman,et al.  Resolving Motion Correspondence for Densely Moving Points , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Takeo Kanade,et al.  A System for Video Surveillance and Monitoring , 2000 .

[6]  Ramakant Nevatia,et al.  Segmentation and Tracking of Multiple Humans in Crowded Environments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Shai Avidan Ensemble Tracking , 2007, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Yanxi Liu,et al.  Online Selection of Discriminative Tracking Features , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  W. Eric L. Grimson,et al.  Learning Semantic Scene Models by Trajectory Analysis , 2006, ECCV.

[10]  Gérard G. Medioni,et al.  Multiple Target Tracking Using Spatio-Temporal Markov Chain Monte Carlo Data Association , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Song-Chun Zhu,et al.  Generalizing Swendsen–Wang for Image Analysis , 2007, Journal of Computational and Graphical Statistics.

[12]  Donald Reid An algorithm for tracking multiple targets , 1978 .

[13]  Xin Li,et al.  Contour-based object tracking with occlusion handling in video acquired using mobile cameras , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Frank Dellaert,et al.  MCMC-based particle filtering for tracking a variable number of interacting targets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Pascal Fua,et al.  Robust People Tracking with Global Trajectory Optimization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Gregory D. Hager,et al.  Joint probabilistic techniques for tracking multi-part objects , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[17]  Song-Chun Zhu,et al.  Deformable Template As Active Basis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[19]  Yongtian Wang,et al.  An integrated background model for video surveillance based on primal sketch and 3D scene geometry , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Yongtian Wang,et al.  Layered Graph Match with Graph Editing , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Hassan Foroosh,et al.  Trajectory Rectification and Path Modeling for Video Surveillance , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22]  Mubarak Shah,et al.  A Multiview Approach to Tracking People in Crowded Scenes Using a Planar Homography Constraint , 2006, ECCV.

[23]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Larry S. Davis,et al.  M2Tracker: A Multi-view Approach to Segmenting and Tracking People in a Cluttered Scene Using Region-Based Stereo , 2002, ECCV.

[26]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.