论文信息 - Global for Coarse and Part for Fine: A Hierarchical Action Recognition Framework

Global for Coarse and Part for Fine: A Hierarchical Action Recognition Framework

Action recognition is one significant yet challenging task in computer vision. Recent methods mainly model an end-to-end one-stage non-deep or deep learning networks to distinguish different action categories. In this paper we introduce one novel hierarchical action classification framework: Unlike existing one-stage recognition models, the proposed work improves the recognition accuracy by: 1) developing a hierarchical coarse-to-fine action classification framework by dividing the recognition processing into two stages: coarse- grained classification and fine-grained classification, and 2) representing actions in different stages with different granularity features representation: global features are utilized for coarse classifiers while more body parts patterns for fine-grained classifiers are aggregated. Experiments on two widely-tested benchmark datasets show that our method can achieve state-of-the-art or competitive performance compared with existing results using one-stage models, with advantages regarding the recognition accuracy and robustness.

[1] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[2] James W. Davis,et al. The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[3] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[4] Tao Mei,et al. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Xiangyang Ji,et al. Action Recognition with Joint Attention on Multi-Level Deep Features , 2016, ArXiv.

[6] Shih-Fu Chang,et al. ConvNet Architecture Search for Spatiotemporal Feature Learning , 2017, ArXiv.

[7] Mubarak Shah,et al. A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[8] Weiyao Lin,et al. Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion , 2018, AAAI.

[9] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[10] Bernt Schiele,et al. A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[12] Robinson Piramuthu,et al. HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[13] Cordelia Schmid,et al. Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[14] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[15] Richard P. Wildes,et al. Spatiotemporal Multiplier Networks for Video Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] John Scott Bridle,et al. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[17] Gang Sun,et al. A Key Volume Mining Deep Framework for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Cordelia Schmid,et al. P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Jitendra Malik,et al. Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).