Mixture of Deep-Based Representation and Shallow Classifiers to Recognize Human Activities

Human action recognition is one of today's investigate issues that has engaged many researchers due to its significant and large applications. Newly, for the sake of the fame and success of the utilization of deep learning-based approaches in diverse fields such as machine vision, object-detection, and natural language processing, investigators have traveled from conventional hand-crafted techniques to deep learning methods. In this investigation, a higher-performance method is proposed using the transfer learning technique for recognizing human action, where a pre-trained deep neural network model is first used to extract features from the target dataset and then, the softmax classifier is used to classify the actions. It can be noticed that the transfer of learning from a huge dataset to the task of action diagnosis with the limited dataset, is well done. The suggested approach is evaluated on two datasets namely KTH and UCF Sports action. Comparative study shows that the suggested approach is superior to hand-crafted extraction-based techniques and other deep learning-based techniques, in terms of accuracy. Also, due to fewer parameters, it is possible to use this method for applications of human action recognition in mobile.

[1]  Li Fei-Fei Knowledge transfer in learning to recognize visual objects classes , 2006 .

[2]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Ling Shao,et al.  Transfer Learning for Visual Categorization: A Survey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Md. Atiqur Rahman Ahad,et al.  Action recognition based on binary patterns of action-history and histogram of oriented gradient , 2016, Journal on Multimodal User Interfaces.

[5]  Plamen Angelov,et al.  Vision Based Human Activity Recognition: A Review , 2016, UKCI.

[6]  Zicheng Liu,et al.  Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Xiaofeng Wang,et al.  Human action recognition using transfer learning with deep representations , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[8]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Atsuto Maki,et al.  From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Plamen Angelov,et al.  A Comprehensive Review on Handcrafted and Learning-Based Action Representation Approaches for Human Activity Recognition , 2017 .

[11]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[12]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Antonios Gasteratos,et al.  On-line deep learning method for action recognition , 2014, Pattern Analysis and Applications.

[15]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Magda B. Fayek,et al.  An enhanced method for human action recognition , 2015, Journal of advanced research.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[21]  Ling Shao,et al.  From handcrafted to learned representations for human action recognition: A survey , 2016, Image Vis. Comput..

[22]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[23]  Indriyati Atmosukarto,et al.  Action Recognition Using Discriminative Structured Trajectory Groups , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[24]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[25]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[26]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[27]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[28]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[29]  Karim Faez,et al.  Leaf Classification for Plant Recognition with Deep Transfer Learning , 2018, 2018 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS).

[30]  Mikel Rodriguez,et al.  Spatio-temporal Maximum Average Correlation Height Templates In Action Recognition And Video Summarization , 2010 .

[31]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Songtao Ding,et al.  An improved interest point detector for human action recognition , 2016, 2016 Chinese Control and Decision Conference (CCDC).

[34]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[35]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[37]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[38]  Xuelong Li,et al.  Transfer learning for pedestrian detection , 2013, Neurocomputing.

[39]  François Brémond,et al.  Gesture recognition by learning local motion signatures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Haibin Ling,et al.  Modeling Geometric-Temporal Context With Directional Pyramid Co-Occurrence for Action Recognition , 2014, IEEE Transactions on Image Processing.

[41]  Patrick Bouthemy,et al.  Better Exploiting Motion for Better Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Plamen Angelov,et al.  Human Action Recognition from Multiple Views Based on View-Invariant Feature Descriptor Using Support Vector Machines , 2016 .

[43]  Alexandros André Chaaraoui,et al.  Silhouette-based human action recognition using sequences of key poses , 2013, Pattern Recognit. Lett..

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[46]  Ioannis A. Kakadiaris,et al.  Modeling Motion of Body Parts for Action Recognition , 2011, BMVC.

[47]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[48]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[49]  Yusuf Aytar,et al.  Transfer learning for object category detection , 2014 .

[50]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[51]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Alessandro Sperduti,et al.  Challenges in Deep Learning , 2016, ESANN.

[54]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[55]  Shaogang Gong,et al.  Fusing appearance and distribution information of interest points for action recognition , 2012, Pattern Recognit..

[56]  Qiuqi Ruan,et al.  Action Recognition Using Local Consistent Group Sparse Coding with Spatio-Temporal Structure , 2016, ACM Multimedia.

[57]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Winston H. Hsu,et al.  Transfer Learning for Video Recognition with Scarce Training Data , 2014, ArXiv.

[59]  Ling Shao,et al.  One shot learning gesture recognition from RGBD images , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[60]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[61]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[62]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[63]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).