Human Action Recognition using Machine Learning in Uncontrolled Environment

Video based Human Action Recognition (HAR) is an active research field of Machine Learning (ML) and human detection in videos is the most important step in action recognition. Recently, several techniques and algorithms have been proposed to increase the accuracy of HAR process, but margin of improvement still exists. Detection and classification of human actions is a challenging task due to random changes in human appearance, clothes, illumination, and background. In this article, an efficient technique to classify human actions by utilizing steps like removing redundant frames from videos, extracting Segments of Interest (SoIs), feature descriptor mining through Geodesic Distance (GD), 3D Cartesian-plane Features (3D-CF), Joints MOCAP (JMOCAP) and n-way Point Trajectory Generation (nPTG). A Neuro Fuzzy Classifier (NFC) is used at the end for the classification purpose. The proposed technique is tested on two publicly available datasets including HMDB-51 and Hollywood2, and achieved an accuracy of 82.55% and 91.99% respectively. These efficient results prove the validity of proposed model.

[1]  Muhammad Younus Javed,et al.  An implementation of optimized framework for action classification using multilayers neural network on selected fused features , 2019, Pattern Analysis and Applications.

[2]  Inzamam Mashood Nasir,et al.  An Optimized Approach for Breast Cancer Classification for Histopathological Images Based on Hybrid Feature Set. , 2020, Current medical imaging.

[3]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[4]  Muhammad Younus Javed,et al.  SCNN: A Secure Convolutional Neural Network using Blockchain , 2020, 2020 2nd International Conference on Computer and Information Sciences (ICCIS).

[5]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Aparecido Nilceu Marana,et al.  Human action recognition in videos based on spatiotemporal features and bag-of-poses , 2020, Appl. Soft Comput..

[7]  Amjad Rehman,et al.  A Hybrid Deep Learning Architecture for the Classification of Superhero Fashion Products: An Application for Medical-Tech Classification , 2020, Computer Modeling in Engineering & Sciences.

[8]  Mubarak Shah,et al.  Recognizing human actions in videos acquired by uncalibrated moving cameras , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Marcin Gabryel,et al.  Pearson Correlation-Based Feature Selection for Document Classification Using Balanced Training , 2020, Sensors.

[10]  D.V. Thombre,et al.  Human detection and tracking using image segmentation and Kalman filter , 2009, 2009 International Conference on Intelligent Agent & Multi-Agent Systems.

[11]  Yueting Zhuang,et al.  Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jianbo Jiao,et al.  Self-supervised Video Representation Learning by Pace Prediction , 2020, ECCV.

[13]  Long Wang,et al.  A Novel Human Activity Recognition Scheme for Smart Health Using Multilayer Extreme Learning Machine , 2019, IEEE Internet of Things Journal.

[14]  A. Sasithradevi,et al.  Video classification and retrieval through spatio-temporal Radon features , 2020, Pattern Recognit..

[15]  C. Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Cheng Dai,et al.  Human action recognition using two-stream attention based LSTM networks , 2020, Appl. Soft Comput..

[17]  Muhammad Younus Javed,et al.  A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection , 2017, EURASIP J. Image Video Process..

[18]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Novanto Yudistira,et al.  Correlation Net: Spatiotemporal multimodal deep learning for action recognition , 2018, Signal Process. Image Commun..

[20]  Yibin Li,et al.  Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos , 2018, Pattern Recognit..

[21]  Weiping Wang,et al.  Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning , 2020, AAAI.

[22]  Michael A. Goodrich,et al.  Human-Robot Interaction: A Survey , 2008, Found. Trends Hum. Comput. Interact..

[23]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[24]  N. Mimmo,et al.  A control architecture for multiple drones operated via multimodal interaction in search & rescue mission , 2016, 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[25]  Qixiang Ye,et al.  Human Detection in Images via Piecewise Linear Support Vector Machines , 2013, IEEE Transactions on Image Processing.

[26]  Xuan-Tu Tran,et al.  A Novel Hardware Architecture for Human Detection using HOG-SVM Co-Optimization , 2019, 2019 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS).

[27]  Remco C. Veltkamp,et al.  Egocentric Hand Track and Object-Based Human Action Recognition , 2019, 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[28]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Jun-Wei Hsieh,et al.  Automatic traffic surveillance system for vehicle tracking and classification , 2006, IEEE Transactions on Intelligent Transportation Systems.

[30]  Yunyoung Nam,et al.  Deep Learning-based Classification of Fruit Diseases: An Application for Precision Agriculture , 2021 .

[31]  Syed Ahmad Chan Bukhari,et al.  A Blockchain based Framework for Stomach Abnormalities Recognition , 2021, Computers, Materials & Continua.

[32]  Gang Hua,et al.  Attention-based Temporal Weighted Convolutional Neural Network for Action Recognition , 2018, AIAI.

[33]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[34]  Akif Durdu,et al.  Human action recognition with bag of visual words using different machine learning methods and hyperparameter optimization , 2019, Neural Computing and Applications.

[35]  Tie Liu,et al.  DeepVS: A Deep Learning Based Video Saliency Prediction Approach , 2018, ECCV.

[36]  Yu Qiao,et al.  Cascade multi-head attention networks for action recognition , 2020, Comput. Vis. Image Underst..

[37]  Aykut Erdem,et al.  Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction , 2016, IEEE Transactions on Multimedia.

[38]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[39]  Juan José Pantrigo,et al.  Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition , 2018, Pattern Recognit..

[40]  Mubarak Shah,et al.  TCLR: Temporal contrastive learning for video representation , 2021, Comput. Vis. Image Underst..

[41]  D. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.