Temporal causality for the analysis of visual events

We present a novel approach to the causal temporal analysis of event data from video content. Our key observation is that the sequence of visual words produced by a space-time dictionary representation of a video sequence can be interpreted as a multivariate point-process. By using a spectral version of the pairwise test for Granger causality, we can identify patterns of interactions between words and group them into independent causal sets. We demonstrate qualitatively that this produces semantically-meaningful groupings, and we demonstrate quantitatively that these groupings lead to improved performance in retrieving and classifying social games from unstructured videos.

[1]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[2]  Mingzhou Ding,et al.  Evaluating causal relations in neural systems: Granger causality, directed transfer function and statistical assessment of significance , 2001, Biological Cybernetics.

[3]  Jeffrey M. Zacks,et al.  Understanding events : from perception to action , 2008 .

[4]  C. Hsiao Autoregressive modeling and causal ordering of economic variables , 1982 .

[5]  A. Walden A unified view of multitaper multivariate spectral estimation , 2000 .

[6]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, ACM Trans. Graph..

[7]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[8]  J. Geweke,et al.  Measurement of Linear Dependence and Feedback between Multiple Time Series , 1982 .

[9]  Mingzhou Ding,et al.  Analyzing multiple spike trains with nonparametric granger causality , 2009, Journal of Computational Neuroscience.

[10]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Shuicheng Yan,et al.  Pair-activity classification by bi-trajectories analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[13]  Matthew Brand,et al.  Physics-Based Visual Understanding , 1997, Comput. Vis. Image Underst..

[14]  Allan D. Jepson,et al.  The Computational Perception of Scene Dynamics , 1997, Comput. Vis. Image Underst..

[15]  Randal C. Nelson,et al.  Detection and Recognition of Periodic, Nonrigid Motion , 1997, International Journal of Computer Vision.

[16]  Daryl J. Daley,et al.  An Introduction to the Theory of Point Processes , 2013 .

[17]  Shaogang Gong,et al.  Modelling activity global temporal dependencies using Time Delayed Probabilistic Graphical Model , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Larry S. Davis,et al.  Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, CVPR.

[19]  James M. Rehg,et al.  Quasi-periodic event analysis for social game retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Robert T. Collins,et al.  Marked point processes for crowd counting , 2009, CVPR.

[21]  S. Bressler,et al.  Granger Causality: Basic Theory and Application to Neuroscience , 2006, q-bio/0608035.

[22]  M. Bartlett The Spectral Analysis of Point Processes , 1963 .

[23]  Antonio Torralba,et al.  LabelMe video: Building a video database with human annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Ali H. Sayed,et al.  A survey of spectral factorization methods , 2001, Numer. Linear Algebra Appl..

[25]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  David Elliott,et al.  In the Wild , 2010 .