论文信息 - Encoding Feature Maps of CNNs for Action Recognition

Encoding Feature Maps of CNNs for Action Recognition

We describe our approach for action classification in the THUMOS Challenge 2015. Our approach is based on two types of features, improved dense trajectories and CNN features. For trajectory features, we extract HOG, HOF, MBHx, and MBHy descriptors and apply Fisher vector encoding. For CNN features, we utilize a recent deep CNN model, VGG19, to capture appearance features and use VLAD encoding to encode/pool convolutional feature maps which shows better performance than average pooling of feature maps and full-connected activation features. After concatenating them, we train a linear SVM classifier for each class in a one-vs-all scheme.

Cordelia Schmid | Xiaojiang Peng | C. Schmid | Xiaojiang Peng

[1] Cordelia Schmid,et al. The LEAR submission at Thumos 2014 , 2014 .

[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3] Cordelia Schmid,et al. A Robust and Efficient Video Representation for Action Recognition , 2015, International Journal of Computer Vision.

[4] 乔宇,et al. Hybrid Super Vector with Improved Dense Trajectories for Action Recognition , 2013 .

[5] Cordelia Schmid,et al. Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[6] Yi Yang,et al. A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Limin Wang,et al. Computer Vision and Image Understanding Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice , 2022 .

[8] Limin Wang,et al. Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).