Inferring Hidden Statuses and Actions in Video by Causal Reasoning

In the physical world, cause and effect are inseparable: ambient conditions trigger humans to perform actions, thereby driving status changes of objects. In video, these actions and statuses may be hidden due to ambiguity, occlusion, or because they are otherwise unobservable, but humans nevertheless perceive them. In this paper, we extend the Causal And-Or Graph (C-AOG) to a sequential model representing actions and their effects on objects over time, and we build a probability model for it. For inference, we apply a Viterbi algorithm, grounded on probabilistic detections from video, to fill in hidden and misdetected actions and statuses. We analyze our method on a new video dataset that showcases causes and effects. Our results demonstrate the effectiveness of reasoning with causality over time.

[1]  Larry S. Davis,et al.  Event Modeling and Recognition Using Markov Logic Networks , 2008, ECCV.

[2]  J. Tenenbaum,et al.  Two proposals for causal grammars , 2007 .

[3]  S. Carey The Origin of Concepts , 2000 .

[4]  Yaser Sheikh,et al.  CASEE: A Hierarchical Event Representation for the Analysis of Videos , 2004, AAAI.

[5]  E. Davis,et al.  Common Sense Reasoning , 2014, Encyclopedia of Social Network Analysis and Mining.

[6]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[7]  Alan Fern,et al.  Probabilistic event logic for interval-based event recognition , 2011, CVPR 2011.

[8]  Gerhard Rigoll,et al.  A Multi-Modal Mixed-State Dynamic Bayesian Network for Robust Meeting Event Recognition from Disturbed Data , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[9]  J. Tenenbaum,et al.  Structure and strength in causal induction , 2005, Cognitive Psychology.

[10]  Song-Chun Zhu,et al.  Learning Perceptual Causality from Video , 2013, AAAI Workshop: Learning Rich Representations from Low-Level Sensors.

[11]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[12]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[13]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[14]  Larry S. Davis,et al.  Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  J. Tenenbaum,et al.  Secret Agents , 2005, Psychological science.

[16]  Kevin P. Murphy Hidden semi-Markov models ( HSMMs ) , 2002 .

[17]  Allan D. Jepson,et al.  Computational Perception of Scene Dynamics , 1996, ECCV.

[18]  Brian Taylor,et al.  Causal video object segmentation from persistence of occlusions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  James M. Rehg,et al.  Temporal causality for the analysis of visual events , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Song-Chun Zhu,et al.  Using Causal Induction in Humans to Learn and Infer Causality from Video , 2013, CogSci.

[21]  Nanning Zheng,et al.  Modeling 4D Human-Object Interactions for Event and Object Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Hermann Ney,et al.  Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation , 2003, CL.

[23]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Zhaowei Cai,et al.  Using context to improve cascaded pedestrian detection , 2014, 2014 International SoC Design Conference (ISOCC).

[25]  G. Csibra,et al.  'Obsessed with goals': functions and mechanisms of teleological interpretation of actions in humans. , 2007, Acta psychologica.

[26]  Matthew Brand,et al.  The "Inverse Hollywood Problem": From Video to Scripts and Storyboards via Causal Analysis , 1997, AAAI/IAAI.

[27]  K. Nakayama,et al.  Illusory Causal Crescents: Misperceived Spatial Relations Due to Perceived Causality , 2004, Perception.

[28]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.