A hierarchical representation for human action recognition in realistic scenes

BoF statistic-based local space-time features action representation is very popular for human action recognition due to its simplicity. However, the problem of large quantization error and weak semantic representation decrease traditional BoF model’s discriminant ability when applied to human action recognition in realistic scenes. To deal with the problems, we investigate the generalization ability of BoF framework for action representation as well as more effective feature encoding about high-level semantics. Towards this end, we present two-layer hierarchical codebook learning framework for human action classification in realistic scenes. In the first-layer action modelling, superpixel GMM model is developed to filter out noise features in STIP extraction resulted from cluttered background, and class-specific learning strategy is employed on the refined STIP feature space to construct compact and descriptive in-class action codebooks. In the second-layer of action representation, LDA-Km learning algorithm is proposed for feature dimensionality reduction and for acquiring more discriminative inter-class action codebook for classification. We take advantage of hierarchical framework’s representational power and the efficiency of BoF model to boost recognition performance in realistic scenes. In experiments, the performance of our proposed method is evaluated on four benchmark datasets: KTH, YouTube (UCF11), UCF Sports and Hollywood2. Experimental results show that the proposed approach achieves improved recognition accuracy than the baseline method. Comparisons with state-of-the-art works demonstrates the competitive ability both in recognition performance and time complexity.

[1]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[2]  Mubarak Shah,et al.  Learning semantic features for action recognition via diffusion maps , 2012, Comput. Vis. Image Underst..

[3]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[4]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Brij Bhooshan Gupta,et al.  XSS-secure as a service for the platforms of online social network-based multimedia web applications in cloud , 2018, Multimedia Tools and Applications.

[6]  Nicolás Guil Mata,et al.  Improving Bag-of-Visual-Words model using visual n-grams for human action classification , 2018, Expert Syst. Appl..

[7]  Ivan Laptev,et al.  Velocity adaptation of space-time interest points , 2004, ICPR 2004.

[8]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[9]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Eric Pardede,et al.  Multi-Cloud Data Management using Shamir's Secret Sharing and Quantum Byzantine Agreement Schemes , 2015, Int. J. Cloud Appl. Comput..

[11]  Hong Liu,et al.  A novel hierarchical Bag-of-Words model for compact action representation , 2016, Neurocomputing.

[12]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[13]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[14]  Honggang Zhang,et al.  Green communications and computing networks [Series Editorial] , 2015, IEEE Commun. Mag..

[15]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Jinsong Wu,et al.  Green communications and computing networks [Series Editorial] , 2015, IEEE Commun. Mag..

[17]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Qing Wang,et al.  Distance metric optimization driven convolutional neural network for age invariant face recognition , 2018, Pattern Recognit..

[19]  Guillermo Sapiro,et al.  Sparse Modeling of Human Actions from Motion Imagery , 2012, International Journal of Computer Vision.

[20]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[21]  Song Guo,et al.  Big Data Meet Green Challenges: Greening Big Data , 2016, IEEE Systems Journal.

[22]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Larry S. Davis,et al.  Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[25]  Dharma P. Agrawal,et al.  Handbook of Research on Modern Cryptographic Solutions for Computer and Cyber Security , 2016 .

[26]  Soumya Roy,et al.  A review of high-speed coherent transmission technologies for long-haul DWDM transmission at 100g and beyond , 2014, IEEE Communications Magazine.

[27]  Zhaoquan Cai,et al.  Facial age estimation by using stacked feature composition and selection , 2016, The Visual Computer.

[28]  Tanaya Guha,et al.  Learning Sparse Representations for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Anpeng Huang,et al.  SMART for mobile health: A study of scheduling algorithms in full-IP mobile networks , 2015, IEEE Communications Magazine.

[30]  Brij B. Gupta,et al.  A novel approach to defend multimedia flash crowd in cloud environment , 2017, Multimedia Tools and Applications.

[31]  Ling Shao,et al.  Action recognition via spatio-temporal local features: A comprehensive study , 2016, Image Vis. Comput..

[32]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Brij Bhooshan Gupta,et al.  Enhancing the Browser-Side Context-Aware Sanitization of Suspicious HTML5 Code for Halting the DOM-Based XSS Vulnerabilities in Cloud , 2017, Int. J. Cloud Appl. Comput..

[34]  Song Guo,et al.  Big Data Meet Green Challenges: Big Data Toward Green Applications , 2016, IEEE Systems Journal.

[35]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.