论文信息 - Multi-Temporal-Resolution Technique for Action Recognition using C3D: Experimental Study

Multi-Temporal-Resolution Technique for Action Recognition using C3D: Experimental Study

In any given video containing an action, the motion conveys information complementary to the individual frames. This motion varies in speed for similar actions. Therefore, it is a promising approach to train a separate deep-learning model for different versions of action speeds. In this paper, two novel ideas are explored: single-temporal-resolution single-model (STR-SM) and multi-temporal-resolution multi-model (MTR-MM). The STR-SM model is trained on one specific temporal resolution of the action dataset. This allows the model to accept a longer temporal frame range as input and therefore, a faster action classification. On the other hand, the MTR-MM is a set of STR-SM models, each trained on a different temporal resolution with a late fusion using majority voting achieving more accurate action recognition. Both models have improvements over the traditional training approach, 3.63% and 6% video-wise accuracy respectively.

Mohammed Marey | Howida A. Shedeed | Bassel S. Chawky | M. Marey | H. Shedeed

[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[3] Richard P. Wildes,et al. Spatiotemporal Residual Networks for Video Action Recognition , 2016, NIPS.

[4] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[5] Rahul Sukthankar,et al. Violence Detection in Video Using Computer Vision Techniques , 2011, CAIP.

[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7] Howida A. Shedeed,et al. A Study of Action Recognition Problems: Dataset and Architectures Perspectives , 2018 .

[8] C. Schmid,et al. Category-Specific Video Summarization , 2014, ECCV.

[9] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[10] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Ting Liu,et al. Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[14] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[15] Cordelia Schmid,et al. Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Xiaoshuai Sun,et al. Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length , 2018, IEEE Transactions on Multimedia.

[19] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[21] Michael Milford,et al. Action recognition: From static datasets to moving robots , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).