Categorizing bi-object video activities using bag of segments and causality features

We address the recognition problem of video activities involving two interacting moving objects under a surveillance camera. We develop a novel video activity representation scheme --'bag of segments'. In this scheme, the video sessions are represented as a collection of independent segments, with memberships to each pre-learned visual patterns that we call codewords. To better represent the video segments with object interaction, we design a set of new features based on the prediction filter responses and the Granger Causality Test (GCT). These features capture the inter-relationship between moving objects and are combined with conventional features such as position and velocity. We validate the proposed method for the task of video activities classification with extensive experiments on a surveillance database with 867 video sessions.

[1]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[2]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, ICCV.

[4]  David J. Kriegman,et al.  Visual tracking and recognition using probabilistic appearance manifolds , 2005, Comput. Vis. Image Underst..

[5]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[7]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[11]  C. Sims Money, Income, and Causality , 1972 .

[12]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[13]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, CVPR 2004.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Ramakant Nevatia,et al.  Event Detection and Analysis from Video Streams , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[17]  Shuicheng Yan,et al.  Pair-activity classification by bi-trajectories analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.