Toward a causal topic model for video scene analysis

Unsupervised of different types of activity in video data has many applications such as anomaly detection, automated tagging of video for search, and cognitive modeling. Topic models originally used in corpus analysis have recently been used to identify different types of activities in videos. Among topic models, probabilistic latent semantic analysis (pLSA) provides an efficient method for identifying clusters of activity in video. This paper integrates pLSA with the causal graphical models of Pearl [1] to learn visual event structures and their temporal relationships simultaneously. The model is fully generative. A noisy-OR style temporal dependence is used for learning which is well known to identify the same causal patterns that human learners do. The addition of temporal learning allows the system to model temporally ordered and long range temporal dependencies that traditional topic models cannot. The model successfully identifies human recognizable event structures in video and successfully classifies videos of human activity learning.

[1]  Tae-Kyun Kim,et al.  Learning Motion Categories using both Semantic and Structural Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Philip M. Fernbach,et al.  Causal learning with local computations. , 2009, Journal of experimental psychology. Learning, memory, and cognition.

[3]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[4]  Shaogang Gong,et al.  Modelling activity global temporal dependencies using Time Delayed Probabilistic Graphical Model , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  Shaogang Gong,et al.  A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Tsuhan Chen,et al.  A Topic-Motion Model for Unsupervised Video Object Discovery , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[14]  Robert Givan,et al.  Specific-to-General Learning for Temporal Events with Application to Learning Event Definitions from Video , 2002, J. Artif. Intell. Res..

[15]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[16]  P. Cheng From covariation to causation: A causal power theory. , 1997 .

[17]  James M. Rehg,et al.  Temporal causality for the analysis of visual events , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Michael D. Lee,et al.  The effect of causal strength on the use of causal and similarity-based information in feature inference , 2010 .

[19]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[20]  Alan L. Yuille,et al.  The Noisy-Logical Distribution and its Application to Causal Inference , 2007, NIPS.

[21]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[22]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[23]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[24]  Jean-Marc Odobez,et al.  Topic models for scene analysis and abnormality detection , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[25]  É. Moulines,et al.  Convergence of a stochastic approximation version of the EM algorithm , 1999 .

[26]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception by Hierarchical Bayesian Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Thomas L. Griffiths,et al.  Learning the Form of Causal Relationships Using Hierarchical Bayesian Models , 2009, Cogn. Sci..

[28]  Thomas L. Griffiths,et al.  Connecting human and machine learning via probabilistic models of cognition , 2009, INTERSPEECH.

[29]  Brian J. Taylor,et al.  Automatic identification of quasi-experimental designs for discovering causal knowledge , 2008, KDD.

[30]  David M. Sobel,et al.  A theory of causal learning in children: causal maps and Bayes nets. , 2004, Psychological review.

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[33]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[34]  Luc Van Gool,et al.  What's going on? Discovering spatio-temporal dependencies in dynamic scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Jiji Zhang,et al.  Causal Reasoning with Ancestral Graphs , 2008, J. Mach. Learn. Res..

[36]  B. Rehder A causal-model theory of conceptual representation and categorization. , 2003, Journal of experimental psychology. Learning, memory, and cognition.