Random field topic model for semantic region analysis in crowded scenes from tracklets

In this paper, a Random Field Topic (RFT) model is proposed for semantic region analysis from motions of objects in crowded scenes. Different from existing approaches of learning semantic regions either from optical flows or from complete trajectories, our model assumes that fragments of trajectories (called tracklets) are observed in crowded scenes. It advances the existing Latent Dirichlet Allocation topic model, by integrating the Markov random fields (MR-F) as prior to enforce the spatial and temporal coherence between tracklets during the learning process. Two kinds of MRF, pairwise MRF and the forest of randomly spanning trees, are defined. Another contribution of this model is to include sources and sinks as high-level semantic prior, which effectively improves the learning of semantic regions and the clustering of tracklets. Experiments on a large scale data set, which includes 40, 000+ tracklets collected from the crowded New York Grand Central station, show that our model outperforms state-of-the-art methods both on qualitative results of learning semantic regions and on quantitative results of clustering tracklets.

[1]  Bo Wu,et al.  Pedestrian Tracking by Associating Tracklets using Detection Residuals , 2008, 2008 IEEE Workshop on Motion and video Computing.

[2]  Zhouyu Fu,et al.  Semantic-Based Surveillance Video Retrieval , 2007, IEEE Transactions on Image Processing.

[3]  Shaogang Gong,et al.  A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  W. Eric L. Grimson,et al.  Trajectory analysis and semantic region modeling using a nonparametric Bayesian model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Shaogang Gong,et al.  Scene Segmentation for Behaviour Correlation , 2008, ECCV.

[7]  Andrea Cavallaro,et al.  Multifeature Object Trajectory Clustering for Video Analysis , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Anna Vilanova,et al.  Evaluation of fiber clustering methods for diffusion tensor imaging , 2005, VIS 05. IEEE Visualization, 2005..

[9]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  W. Eric L. Grimson,et al.  Correspondence-Free Activity Analysis and Scene Modeling in Multiple Camera Views , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Shaogang Gong,et al.  Multi-camera activity correlation analysis , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  James Orwell,et al.  Learning the Semantic Landscape: embedding scene knowledge in object tracking , 2005, Real Time Imaging.

[13]  Osama Masoud,et al.  Learning Traffic Patterns at Intersections by Spectral Clustering of Motion Trajectories , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Tianzhu Zhang,et al.  Learning semantic scene models by object classification and trajectory clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  M. Trivedi,et al.  Learning trajectory patterns by clustering: Experimental studies and comparative evaluation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Mubarak Shah,et al.  Probabilistic Modeling of Scene Dynamics for Applications in Visual Surveillance , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Mohan M. Trivedi,et al.  A Survey of Vision-Based Trajectory Learning and Analysis for Surveillance , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  W. Eric L. Grimson,et al.  Learning Semantic Scene Models by Trajectory Analysis , 2006, ECCV.

[19]  Hai Jin,et al.  Trajectory parsing by cluster sampling in spatio-temporal graph , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  A. G. Amitha Perera,et al.  A unified framework for tracking through occlusions and across sensor gaps , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Tieniu Tan,et al.  A system for learning statistical motion patterns , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Shaogang Gong,et al.  Global Behaviour Inference using Probabilistic Latent Semantic Analysis , 2008, BMVC.

[24]  Robert T. Collins,et al.  Multi-target Data Association by Tracklets with Unsupervised Parameter Estimation , 2008, BMVC.

[25]  Chris Stauffer,et al.  Estimating Tracking Sources and Sinks , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[26]  Jun-Wei Hsieh,et al.  Automatic traffic surveillance system for vehicle tracking and classification , 2006, IEEE Transactions on Intelligent Transportation Systems.

[27]  Mubarak Shah,et al.  Video Scene Understanding Using Multi-scale Analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[29]  Tim J. Ellis,et al.  Learning semantic scene models from observing activity in visual surveillance , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Bill Triggs,et al.  Region Classification with Markov Field Aspect Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.