Temporal Context Analysis for Action Recognition in Multi-agent Scenarios

In multi-agent scenarios such as sports videos, multiple actions are played by different players. Such actions do not necessary appear strictly sequentially but can happen in parallel. Approaches which only consider a single stream of actions are not competent to handle such scenarios. The temporal and causal relationships between the action streams such as "concurrence", "mutually exclusion" and "triggering" need to be captured so as to correctly recognize the actions. In this paper, a novel method is presented for action recognition in multi-agent scenarios leveraged by analyzing the relationships among the temporal contextual actions. The multi-streams of actions are modeled by a Dynamic Baysian Network (DBN) containing several temporal processes corresponding to each type of action. Comparing to the Coupled Hidden Markov Model (CHMM), only the necessary interlinks between the temporal processes are built by a structure learning algorithm to capture the salient relationships. Empirical results on real-world video data demonstrate the effectiveness of our proposed method.

[1]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Alan Fern,et al.  Probabilistic event logic for interval-based event recognition , 2011, CVPR 2011.

[3]  Louis Kratz,et al.  Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models , 2009, CVPR.

[4]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence , 2004, Computer science and data analysis series.

[5]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[7]  Ramakant Nevatia,et al.  Multi-agent event recognition , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[9]  Jake K. Aggarwal,et al.  Recognition of Composite Human Activities through Context-Free Grammar Based Representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).