A new hybrid deep learning model for human action recognition

Abstract Human behavior has been always an important factor in social communication. The human activity and action recognition are all clues that facilitate the analysis of human behavior. Human action recognition is an important challenge in a variety of application including human-computer interaction and intelligent video surveillance to enhance security in different domains. The evaluation algorithm relies on the proper extraction and the learning data. The success of the deep learning led to many imposing results in several contexts that include neural network. Here the emergence of Gated Recurrent Neural Networks with increased computation powers is being adopted for sequential data and video classification. However, to have an efficient classifier for assigning the class label, it is very necessary to have a strong features vector. Features are the most important information in each data. Indeed, features extraction can influence on the performance of the algorithm and the computation complexity. This paper proposes a novel approach for human action recognition based on hybrid deep learning model. The proposed approach is evaluated on the challenging UCF Sports, UCF101 and KTH datasets. An average of 96.3% is obtained when we have tested on KTH dataset.

[1]  Björn W. Schuller,et al.  A multi-stream ASR framework for BLSTM modeling of conversational speech , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[3]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Guo-Jun Qi,et al.  Differential Recurrent Neural Networks for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Haroon Idrees,et al.  The THUMOS challenge on action recognition for videos "in the wild" , 2016, Comput. Vis. Image Underst..

[7]  Larry S. Davis,et al.  Real-time foreground-background segmentation using codebook model , 2005, Real Time Imaging.

[8]  Varsha Hemant Patil,et al.  A Study of Vision based Human Motion Recognition and Analysis , 2016, Int. J. Ambient Comput. Intell..

[9]  Ramachandran Baskaran,et al.  Automated human behavior analysis from surveillance videos: a survey , 2014, Artificial Intelligence Review.

[10]  Ioannis Arapakis,et al.  Theories, methods and current research on emotions in library and information science, information retrieval and human-computer interaction , 2011, Inf. Process. Manag..

[11]  Juan D. Pulgarin-Giraldo,et al.  Relevant Kinematic Feature Selection to Support Human Action Recognition in MoCap Data , 2017, IWINAC.

[12]  Björn W. Schuller,et al.  From speech to letters - using a novel neural network architecture for grapheme based ASR , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[13]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Xianglong Tang,et al.  Hierarchical Model-Based Human Motion Tracking Via Unscented Kalman Filter , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Teddy Ko,et al.  A Survey on Behavior Analysis in Video Surveillance Applications , 2011 .

[16]  JianXin Song,et al.  Human Action Recognition based on Convolutional Neural Networks with a Convolutional Auto-Encoder , 2016 .

[17]  Nilanjan Dey,et al.  Applied Video Processing in Surveillance and Monitoring Systems , 2016 .

[18]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[20]  Seok-Woo Jang,et al.  2D human body tracking with Structural Kalman filter , 2002, Pattern Recognit..

[21]  Pramod R. Gunjal,et al.  Moving Object Tracking Using Kalman Filter , 2018, 2018 International Conference On Advances in Communication and Computing Technology (ICACCT).

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  Majd Latah,et al.  Human action recognition using support vector machines and 3D convolutional neural networks , 2017 .

[24]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).