Feature Similarity and Frequency-Based Weighted Visual Words Codebook Learning Scheme for Human Action Recognition

Human action recognition has become a popular field for computer vision researchers in the recent decade. This paper presents a human action recognition scheme based on a textual information concept inspired by document retrieval systems. Videos are represented using a commonly used local feature representation. In addition, we formulate a new weighted class specific dictionary learning scheme to reflect the importance of visual words for a particular action class. Weighted class specific dictionary learning enriches the scheme to learn a sparse representation for a particular action class. To evaluate our scheme on realistic and complex scenarios, we have tested it on UCF Sports and UCF11 benchmark datasets. This paper reports experimental results that outperform recent state-of-the-art methods for the UCF Sports and the UCF11 dataset i.e. 98.93% and 93.88% in terms of average accuracy respectively. To the best of our knowledge, this contribution is first to apply a weighted class specific dictionary learning method on realistic human action recognition datasets.

[1]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[2]  Muhammad Haroon Yousaf,et al.  Inter and Intra Class Correlation Analysis (IICCA) for Human Action Recognition in Realistic Scenarios , 2017 .

[3]  Yiannis Kompatsiaris,et al.  ITI-CERTH participation to TRECVID 2015 , 2015, TRECVID.

[4]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[5]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[6]  Cordelia Schmid,et al.  Multi-region Two-Stream R-CNN for Action Detection , 2016, ECCV.

[7]  Amit Sethi,et al.  Action recognition using interest points capturing differential motion information , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[9]  Limin Wang,et al.  Computer Vision and Image Understanding Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice , 2022 .

[10]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[11]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Pierre Tirilly,et al.  A review of weighting schemes for bag of visual words image retrieval , 2009 .

[13]  Arnaldo de Albuquerque Araújo,et al.  Combining Orientation Tensors for Human Action Recognition , 2013, 2013 XXVI Conference on Graphics, Patterns and Images.

[14]  乔宇,et al.  Hybrid Super Vector with Improved Dense Trajectories for Action Recognition , 2013 .

[15]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[16]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Pierre Tirilly,et al.  Distances and weighting schemes for bag of visual words image retrieval , 2010, MIR '10.

[18]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[19]  Basura Fernando,et al.  Learning End-to-end Video Classification with Rank-Pooling , 2016, ICML.

[20]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Muhammad Haroon Yousaf,et al.  Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition , 2018, Comput. Electr. Eng..

[22]  Yu-Kun Lai,et al.  Saliency guided local and global descriptors for effective action recognition , 2016, Computational Visual Media.

[23]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[24]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[25]  Roland Göcke,et al.  Ordered Trajectories for Large Scale Human Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision Workshops.