论文信息 - Modeling Complex Temporal Composition of Actionlets for Activity Prediction

Modeling Complex Temporal Composition of Actionlets for Activity Prediction

Early prediction of ongoing activity has been more and more valuable in a large variety of time-critical applications. To build an effective representation for prediction, human activities can be characterized by a complex temporal composition of constituent simple actions. Different from early recognition on short-duration simple activities, we propose a novel framework for long-duration complex activity prediction by discovering the causal relationships between constituent actions and the predictable characteristics of activities. The major contributions of our work include: (1) we propose a novel activity decomposition method by monitoring motion velocity which encodes a temporal decomposition of long activities into a sequence of meaningful action units; (2) Probabilistic Suffix Tree (PST) is introduced to represent both large and small order Markov dependencies between action units; (3) we present a Predictive Accumulative Function (PAF) to depict the predictability of each kind of activity. The effectiveness of the proposed method is evaluated on two experimental scenarios: activities with middle-level complexity and activities with high-level complexity. Our method achieves promising results and can predict global activity classes and local action units.

Yun Fu | Jie Hu | Kang Li

[1] Juan Carlos Niebles,et al. Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[2] Irfan A. Essa,et al. A novel sequence representation for unsupervised analysis of human activities , 2009, Artif. Intell..

[3] Cordelia Schmid,et al. Actom sequence models for efficient action detection , 2011, CVPR 2011.

[4] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.

[6] Jake K. Aggarwal,et al. Recognition of Composite Human Activities through Context-Free Grammar Based Representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7] James Ferryman,et al. Proceedings of the thirteenth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance , 2009 .

[8] Rama Chellappa,et al. From Videos to Verbs: Mining Videos for Activities using a Cascade of Dynamical Systems , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Larry S. Davis,et al. Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Larry S. Davis,et al. Multi-agent event recognition in structured scenarios , 2011, CVPR 2011.

[11] Dana Ron,et al. The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.

[12] Bohyung Han,et al. Scenario-based video event recognition by constraint flow , 2011, CVPR 2011.

[13] Sharath Pankanti,et al. Recognition of repetitive sequential human activity , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Thomas Deselaers,et al. ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[15] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[16] William Brendel,et al. Learning spatiotemporal graphs of human activities , 2011, 2011 International Conference on Computer Vision.

[17] Benjamin Z. Yao,et al. Unsupervised learning of event AND-OR grammar and semantics from video , 2011, 2011 International Conference on Computer Vision.

[18] Alan Fern,et al. Probabilistic event logic for interval-based event recognition , 2011, CVPR 2011.

[19] Michael S. Ryoo,et al. Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[20] Robert T. Collins,et al. An Open Source Tracking Testbed and Evaluation Web Site , 2005 .

[21] Aaron F. Bobick,et al. Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22] Yunde Jia,et al. Parsing video events with goal inference and intent prediction , 2011, 2011 International Conference on Computer Vision.

[23] Ran El-Yaniv,et al. On Prediction Using Variable Order Markov Models , 2004, J. Artif. Intell. Res..