Exploiting the deep learning paradigm for recognizing human actions

In this paper we propose a novel method for recognizing human actions by exploiting a multi-layer representation based on a deep learning based architecture. A first level feature vector is extracted and then a high level representation is obtained by taking advantage of a Deep Belief Network trained using a Restricted Boltzmann Machine. The classification is finally performed by a feed-forward neural network. The main advantage behind the proposed approach lies in the fact that the high level representation is automatically built by the system exploiting the regularities in the dataset; given a suitably large dataset, it can be expected that such a representation can outperform a hand-design description scheme. The proposed approach has been tested on two standard datasets and the achieved results, compared with state of the art algorithms, confirm its effectiveness.

[1]  Nihat Ay,et al.  Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines , 2010, Neural Computation.

[2]  Alessia Saggese,et al.  Dynamic Scene Understanding for Behavior Analysis Based on String Kernels , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Il Hong Suh,et al.  Empirical Evaluation on Deep Learning of Depth Feature for Human Activity Recognition , 2013, ICONIP.

[4]  Christian Wolf,et al.  Sequential Deep Learning for Human Action Recognition , 2011, HBU.

[5]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[7]  R. Venkatesh Babu,et al.  Human action recognition using depth maps , 2012, 2012 International Conference on Signal Processing and Communications (SPCOM).

[8]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[9]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[10]  Alessia Saggese,et al.  Combining Neural Networks and Fuzzy Systems for Human Behavior Understanding , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[11]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[12]  Alessia Saggese,et al.  Recognizing Human Actions by a Bag of Visual Words , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[13]  이상헌,et al.  Deep Belief Networks , 2010, Encyclopedia of Machine Learning.

[14]  Nuno Vasconcelos,et al.  Recognizing Activities via Bag of Words for Attribute Dynamics , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[16]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[17]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[19]  Qiang Wu,et al.  Human Action Recognition Based on Radon Transform , 2011, Multimedia Analysis, Processing and Communications.

[20]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[21]  Alessia Saggese,et al.  HAck: A system for the recognition of human actions by kernels of visual strings , 2014, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[22]  Anupam Agrawal,et al.  A survey on activity recognition and behavior understanding in video surveillance , 2012, The Visual Computer.

[23]  Haibin Ling,et al.  3D R Transform on Spatio-temporal Interest Points for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Alessia Saggese,et al.  Recognition of Human Actions from RGB-D Videos Using a Reject Option , 2013, ICIAP Workshops.

[25]  Carlo Gatta,et al.  Do We Really Need All These Neurons? , 2013, IbPRIA.