A visual cognizance based multi-resolution descriptor for human action recognition using key pose

Abstract Human activity recognition using videos sequences is a well-known phenomenon which has many real-life applications such as daily assistive living, security and surveillance, patient monitoring, robotics, and sports analysis. Recently, single or still images based action recognition is becoming very popular due to spatial cues present in an image and required less computation. Hence, a robust framework is constructed by computation of textural and spatial cues of still images at multi-resolution. A fuzzy inference model is used to select the single key image from action video sequences using maximum histogram distance between stacks of frames. To represent, these key pose images the textural traits at various orientations and scales are extracted using Gabor wavelet while shape traits are computed through a multilevel approach called Spatial Edge Distribution of Gradients (SEDGs). Finally, a hybrid model of action descriptor is developed using shape and textural evidence, which is known as Extended Multi-Resolution Features (EMRFs) model. The highest classification accuracy is achieved through SVM classifier on various human action datasets: Weizmann Action (100%), KTH (95.35%), Ballet (92.75%), and UCF YouTube (96.36%). The highest accuracy achieved on these datasets are compared with similar state-of-the-art approaches and EMRFs shows superior performance.

[1]  Christian Thurau,et al.  Behavior Histograms for Action Recognition and Human Detection , 2007, Workshop on Human Motion.

[2]  Rajiv Kapoor,et al.  Unified framework for human activity recognition: An approach using spatial edge distribution and ℜ-transform , 2016 .

[3]  Christian Bauckhage,et al.  Action recognition in still images by learning spatial interest regions from videos , 2015, Pattern Recognit. Lett..

[4]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[5]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Pinar Duygulu Sahin,et al.  A line based pose representation for human action recognition , 2013, Signal Process. Image Commun..

[7]  Yang Wang,et al.  Recognizing human actions from still images with latent poses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Jianfei Cai,et al.  Action Recognition in Still Images With Minimum Annotation Efforts , 2016, IEEE Transactions on Image Processing.

[9]  Kuldeep Singh,et al.  Convolutional neural networks for crowd behaviour analysis: a survey , 2019, The Visual Computer.

[10]  Yang Wang,et al.  Human Action Recognition by Semilatent Topic Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[12]  Lun-zheng Tan,et al.  Human action recognition based on chaotic invariants , 2013, Journal of Central South University.

[13]  Ki-Sang Hong,et al.  Modeling temporal structure of complex actions using Bag-of-Sequencelets , 2017, Pattern Recognit. Lett..

[14]  Anitha Pasupathy,et al.  Visual Shape and Object Perception , 2018 .

[15]  Alexandros Iosifidis,et al.  Discriminant Bag of Words based representation for human action recognition , 2014, Pattern Recognit. Lett..

[16]  Wanqing Li,et al.  Human detection from images and videos: A survey , 2016, Pattern Recognit..

[17]  Yang Yi,et al.  Human action recognition with graph-based multiple-instance learning , 2016, Pattern Recognit..

[18]  Hongxun Yao,et al.  Distinctive action sketch for human action recognition , 2018, Signal Process..

[19]  Muhammad Haroon Yousaf,et al.  A Bag of Expression framework for improved human action recognition , 2018, Pattern Recognit. Lett..

[20]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[21]  Guodong Guo,et al.  A survey on still image based human action recognition , 2014, Pattern Recognit..

[22]  Brian C. Lovell,et al.  Efficient clustering on Riemannian manifolds: A kernelised random projection approach , 2015, Pattern Recognit..

[23]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[24]  Deepu Rajan,et al.  Human action recognition using Pose-based discriminant embedding , 2012, Signal Process. Image Commun..

[25]  Subhransu Maji,et al.  Action recognition from a distributed representation of pose and appearance , 2011, CVPR 2011.

[26]  Tej Singh,et al.  Video benchmarks of human action datasets: a review , 2018, Artificial Intelligence Review.

[27]  Kuldeep Singh,et al.  Human Activity Recognition Based on Spatial Distribution of Gradients at Sublevels of Average Energy Silhouette Images , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[28]  Arivazhagan Selvaraj,et al.  Texture classification using Gabor wavelets based rotation invariant features , 2006, Pattern Recognit. Lett..

[29]  Ivan Laptev,et al.  Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.

[30]  Yang Liu,et al.  Visual tracking via salient feature extraction and sparse collaborative model , 2018 .

[31]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[32]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[33]  Alexandros André Chaaraoui,et al.  Silhouette-based human action recognition using sequences of key poses , 2013, Pattern Recognit. Lett..

[34]  Kang Ryoung Park,et al.  Fuzzy system based human behavior recognition by combining behavior prediction and recognition , 2017, Expert Syst. Appl..

[35]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[36]  Daniela Moctezuma,et al.  HoGG: Gabor and HoG-based human detection for surveillance in non-controlled environments , 2013, Neurocomputing.

[37]  Junbin Gao,et al.  Localized LRR on Grassmann Manifold: An Extrinsic View , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Yanning Zhang,et al.  Going deeper with two-stream ConvNets for action recognition in video surveillance , 2017, Pattern Recognit. Lett..

[39]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[40]  Nazli Ikizler-Cinbis,et al.  Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.

[41]  Tusar Kanti Mishra,et al.  Human recognition system for outdoor videos using Hidden Markov model , 2014 .

[42]  Tanaya Guha,et al.  Learning Sparse Representations for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.