SV-NET: A Deep Learning Approach to Video Based Human Activity Recognition

The automatic identification of physical activities performed by human beings is referred to as Human Activity Recognition (HAR). It aims to infer the actions of one or more persons from a set of observations captured by sensors, videos or still images. Recognizing human activities from video sequences is a much challenging task due to problems such as background clutter, partial occlusion, changes in scale, viewpoint, lighting, and appearance etc. In this paper, we propose a Convolutional Neural Network (CNN) model named as SV-NET, in order to classify human activities obtained directly from RGB videos. The proposed model has been tested on three benchmark video datasets namely, KTH, UCF11 and HMDB51. The results of the proposed model demonstrate improved performance over some existing deep learning based models.

[1]  Thomas Serre,et al.  The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  James A. Reggia,et al.  Robust human action recognition via long short-term memory , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[3]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[4]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[5]  G. Akilandasowmya,et al.  Human action analysis using K-NN classifier , 2015, 2015 Seventh International Conference on Advanced Computing (ICoAC).

[6]  Fei-Fei Li,et al.  Socially-Aware Large-Scale Crowd Forecasting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Stan Sclaroff,et al.  Space-time tree ensemble for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Tinne Tuytelaars,et al.  Modeling video evolution for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Limin Wang,et al.  A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition , 2012, ACCV.

[10]  Larry S. Davis,et al.  Objects in Action: An Approach for Combining Action Understanding and Object Perception , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Dacheng Tao,et al.  Slow Feature Analysis for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Georgios Evangelidis,et al.  Continuous Action Recognition Based on Sequence Alignment , 2014, International Journal of Computer Vision.

[13]  Amit K. Roy-Chowdhury,et al.  Incremental Activity Modeling and Recognition in Streaming Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Muhammad Usman Ghani Khan,et al.  Human activity recognition using mixture of heterogeneous features and sequential minimal optimization , 2018, Int. J. Mach. Learn. Cybern..

[15]  Bhiksha Raj,et al.  Beyond Gaussian Pyramid: Multi-skip Feature Stacking for action recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Gang Yu,et al.  Fast action proposals for human action detection and search , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[18]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[20]  Nazli Ikizler-Cinbis,et al.  Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.

[21]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[22]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[23]  Dewen Hu,et al.  Learning Effective Event Models to Recognize a Large Number of Human Actions , 2014, IEEE Transactions on Multimedia.