Sparsity-inducing dictionaries for effective action classification

Action recognition in unconstrained videos is one of the most important challenges in computer vision. In this paper, we propose sparsity-inducing dictionaries as an effective representation for action classification in videos. We demonstrate that features obtained from sparsity based representation provide discriminative information useful for classification of action videos into various action classes. We show that the constructed dictionaries are distinct for a large number of action classes resulting in a significant improvement in classification accuracy on the HMDB51 dataset. We further demonstrate the efficacy of dictionaries and sparsity based classification on other large action video datasets like UCF50. HighlightsSparsity-inducing dictionaries as an effective representation for action classification in videos.Features obtained from sparsity based representation provide enough discriminative information for classification of action videos.Constructed dictionaries are distinct for a large number of action classes resulting in a significant improvement in classification accuracy.

[1]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[2]  Tanaya Guha,et al.  Learning Sparse Representations for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yu Qiao,et al.  A Study on Unsupervised Dictionary Learning and Feature Encoding for Action Classification , 2013, ArXiv.

[5]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[6]  Liang Lin,et al.  Learning latent spatio-temporal compositional model for human action recognition , 2013, MM '13.

[7]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[8]  Suzhen Wang,et al.  Online adaptive dictionary learning and weighted sparse coding for abnormality detection , 2013, 2013 IEEE International Conference on Image Processing.

[9]  Tal Hassner,et al.  Motion Interchange Patterns for Action Recognition in Unconstrained Videos , 2012, ECCV.

[10]  YanShuicheng,et al.  Deep Human Parsing with Active Template Regression , 2015 .

[11]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[12]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jitendra Malik,et al.  Contextual Action Recognition with R*CNN , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[15]  Sinisa Todorovic Human Activities as Stochastic Kronecker Graphs , 2012, ECCV.

[16]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Guillermo Sapiro,et al.  Sparse Modeling of Human Actions from Motion Imagery , 2012, International Journal of Computer Vision.

[18]  Dewen Hu,et al.  Learning Effective Event Models to Recognize a Large Number of Human Actions , 2014, IEEE Transactions on Multimedia.

[19]  Alexander C. Berg,et al.  Combining multiple sources of knowledge in deep CNNs for action recognition , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[20]  Qionghai Dai,et al.  Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity , 2014, ACCV.

[21]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[22]  Changyin Sun,et al.  Action Recognition Using Nonnegative Action Component Representation and Sparse Basis Selection , 2014, IEEE Transactions on Image Processing.

[23]  Mubarak Shah,et al.  Classifying web videos using a global video descriptor , 2013, Machine Vision and Applications.

[24]  Jian Zhang,et al.  Fast human action classification and VOI localization with enhanced sparse coding , 2013, J. Vis. Commun. Image Represent..

[25]  Meng Wang,et al.  3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks , 2014, ACM Multimedia.

[26]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[27]  Limin Wang,et al.  A Joint Evaluation of Dictionary Learning and Feature Encoding for Action Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[28]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[29]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Lin Sun,et al.  Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Peng Wang,et al.  Temporal Pyramid Pooling-Based Convolutional Neural Network for Action Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Meng Wang,et al.  A Deep Structured Model with Radius–Margin Bound for 3D Human Activity Recognition , 2015, International Journal of Computer Vision.

[33]  Jian Dong,et al.  Deep Human Parsing with Active Template Regression , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Chong-Wah Ngo,et al.  Trajectory-Based Modeling of Human Actions with Motion Reference Points , 2012, ECCV.

[35]  Alexander G. Hauptmann,et al.  Beyond Spatial Pyramid Matching: Space-time Extended Descriptor for Action Recognition , 2015, ArXiv.

[36]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[37]  C. Krishna Mohan,et al.  Sparsifying Dense Features for Action Classification , 2015, PerMIn '15.

[38]  Liang Wang,et al.  Multi-view descriptor mining via codeword net for action recognition , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[39]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[40]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[41]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Rama Chellappa,et al.  Sparse dictionary-based representation and recognition of action attributes , 2011, 2011 International Conference on Computer Vision.

[43]  Ming Lin,et al.  Long-short Term Motion Feature for Action Classification and Retrieval , 2015, ArXiv.

[44]  Bhiksha Raj,et al.  Handcrafted Local Features are Convolutional Neural Networks , 2015, ArXiv.