论文信息 - Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition

Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition

Abstract The detection of the spatial-temporal interest points has a key role in human action recognition algorithms. This research work aims to exploit the existing strength of bag-of-visual features and presents a method for automatic action recognition in realistic and complex scenarios. This paper provides a better feature representation by combining the benefit of both a well-known feature detector and descriptor i.e. the 3D Harris space-time interest point detector and the 3D Scale-Invariant Feature Transform descriptor. Finally, action videos are represented using a histogram of visual features by following the traditional bag-of-visual feature approach. Apart from video representation, a support vector machine (SVM) classifier is used for training and testing. A large number of experiments show the effectiveness of our method on existing benchmark datasets and shows state-of-the-art performance. This article reports 68.1% mean Average Precision (mAP), 94% and 91.8% average accuracy for Hollywood-2, UCF Sports and KTH datasets respectively.

[1] Luc Van Gool,et al. Efficient Mining of Frequent and Distinctive Feature Configurations , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2] Ivan Laptev,et al. On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[3] Du-Ming Tsai,et al. Optical flow-motion history image (OF-MHI) for action recognition , 2015, Signal Image Video Process..

[4] Mubarak Shah,et al. A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[5] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.

[6] Thomas B. Moeslund,et al. A selective spatio-temporal interest point detector for human action recognition in complex scenes , 2011, 2011 International Conference on Computer Vision.

[7] Lin Sun,et al. DL-SFA: Deeply-Learned Slow Feature Analysis for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Cordelia Schmid,et al. Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[9] Luc Van Gool,et al. A Hough transform-based voting framework for action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10] Ronald Poppe,et al. A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[11] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Ivan Laptev,et al. Improving bag-of-features action recognition with non-local cues , 2010, BMVC.

[13] Cordelia Schmid,et al. Actions in context , 2009, CVPR.

[14] Mubarak Shah,et al. Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Rama Chellappa,et al. Sparse dictionary-based representation and recognition of action attributes , 2011, 2011 International Conference on Computer Vision.

[16] Andrew Gilbert,et al. Action Recognition Using Mined Hierarchical Compound Features , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Cees Snoek,et al. What do 15,000 object categories tell us about classifying and localizing actions? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.

[19] Haibin Ling,et al. 3D R Transform on Spatio-temporal Interest Points for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Adrian Hilton,et al. A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[21] Dong Han,et al. Selection and context for action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22] Lihi Zelnik-Manor,et al. Statistical analysis of dynamic actions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Yun Fu,et al. Sparse Coding on Local Spatial-Temporal Volumes for Human Action Recognition , 2010, ACCV.