论文信息 - Cross-View Action Modeling, Learning, and Recognition

Cross-View Action Modeling, Learning, and Recognition

Existing methods on video-based action recognition are generally view-dependent, i.e., performing recognition from the same views seen in the training data. We present a novel multiview spatio-temporal and-or graph (MST-AOG) representation for cross-view action recognition, i.e., the recognition is performed on the video from an unknown and unseen view. As a compositional model, MST-AOG compactly represents the hierarchical combinatorial structures of cross-view actions by explicitly modeling the geometry, appearance and motion variations. This paper proposes effective methods to learn the structure and parameters of MST-AOG. The inference based on MST-AOG enables action recognition from novel views. The training of MST-AOG takes advantage of the 3D human skeleton data obtained from Kinect cameras to avoid annotating enormous multi-view video frames, which is error-prone and time-consuming, but the recognition does not need 3D information and is based on 2D video input. A new Multiview Action3D dataset has been created and will be released. Extensive experiments have demonstrated that this new action representation significantly improves the accuracy and robustness for cross-view action recognition on 2D videos.

[1] Ying Wu,et al. Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2] S. Sclaroff,et al. Web-Based Classifiers for Human Action Recognition , 2012, IEEE Transactions on Multimedia.

[3] Jason J. Corso,et al. Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Ali Farhadi,et al. Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[5] Patrick Pérez,et al. Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[6] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Silvio Savarese,et al. Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[8] Sven J. Dickinson,et al. 3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[9] Cordelia Schmid,et al. Multi-view object class detection with a 3D geometric model , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10] Zicheng Liu,et al. Animated Pose Templates for Modeling and Detecting Human Actions , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Binlong Li,et al. Cross-view activity recognition using Hankelets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Subhransu Maji,et al. Action recognition from a distributed representation of pose and appearance , 2011, CVPR 2011.

[13] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Benjamin Z. Yao,et al. Learning deformable action templates from cluttered videos , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[17] Rama Chellappa,et al. View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[18] Luc Van Gool,et al. Coupled Action Recognition and Pose Estimation from Multiple Views , 2012, International Journal of Computer Vision.

[19] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20] Chunheng Wang,et al. Cross-View Action Recognition via a Continuous Virtual Path , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Song-Chun Zhu,et al. Learning AND-OR Templates for Object Recognition and Detection , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Fei-Fei Li,et al. Action Recognition with Exemplar Based 2.5D Graph Matching , 2012, ECCV.

[23] Deva Ramanan,et al. Detecting Actions, Poses, and Objects with Relational Phraselets , 2012, ECCV.

[24] Ruonan Li,et al. Discriminative virtual views for cross-view action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.