Action recognition based on semantic feature description and cross classification

Action recognition is a challenging topic in computer vision. In this work, we present a novel method for action recognition which is based on two claimed contributions: semantic feature description and cross classification. The designed descriptor is combined by several local 3D-SIFT and is informative and distinctive, reflecting the spatio-temporal clues of the video. The cross classification effectively combines the feature localization and action categorization together. The proposed method is justified on a popular dateset named UCF50 and the experimental results demonstrate that our method outperforms the state-of-the-art competitors.

[1]  Pingkun Yan,et al.  Visual Saliency by Selective Contrast , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Larry S. Davis,et al.  A unified tree-based framework for joint action localization, recognition and segmentation , 2013, Comput. Vis. Image Underst..

[4]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[5]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[6]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Xuelong Li,et al.  Saliency Detection by Multiple-Instance Learning , 2013, IEEE Transactions on Cybernetics.

[8]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[10]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Qi Wang,et al.  Multi-cue based tracking , 2014, Neurocomputing.

[12]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[13]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[14]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[15]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[16]  Dong Xu,et al.  Action Recognition Using Multilevel Features and Latent Structural SVM , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[18]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[19]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Xuelong Li,et al.  Multi-spectral saliency detection , 2013, Pattern Recognit. Lett..

[21]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[22]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[25]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.