Human Action Recognition using Salient Region Detection in Complex Scenes

Although the methods based on spatio-temporal interest points have shown promising results for human action recognition, they are not robust in complex scenes especially background clutter, camera motion, occlusions and illumination variations. In this paper, we propose a novel method to classify human actions in complex scenes. We suppress the false detection interest points by detecting salient regions. Furthermore, we encode the features according to their spatio-temporal relationship. Our method is verified on two challenging databases (UCF sports and YouTube), and the experimental results demonstrate that our method achieves better results than previous methods in human action recognition.

[1]  Chunheng Wang,et al.  Cross-View Action Recognition via a Continuous Virtual Path , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[3]  Chunheng Wang,et al.  Contextual Fisher kernels for human action recognition , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[4]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Stefano Soatto,et al.  Tracklet Descriptors for Action Modeling and Video Analysis , 2010, ECCV.

[6]  Thomas B. Moeslund,et al.  Selective spatio-temporal interest points , 2012, Comput. Vis. Image Underst..

[7]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[8]  Larry S. Davis,et al.  A Tree-Based Approach to Integrated Action Localization, Recognition and Segmentation , 2010, ECCV Workshops.

[9]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[10]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[11]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Chunheng Wang,et al.  Action Recognition Using Context-Constrained Linear Coding , 2012, IEEE Signal Processing Letters.

[13]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[15]  Shaogang Gong,et al.  Discriminative Topics Modelling for Action Feature Selection and Recognition , 2010, BMVC.

[16]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[17]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[19]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[20]  Zicheng Liu,et al.  Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Tae-Kyun Kim,et al.  Real-time Action Recognition by Spatiotemporal Semantic and Structural Forests , 2010, BMVC.

[22]  Jean Ponce,et al.  Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Chunheng Wang,et al.  Attribute Regularization Based Human Action Recognition , 2013, IEEE Transactions on Information Forensics and Security.