Robust Human Activity Recognition Using Multimodal Feature-Level Fusion

Automated recognition of human activities or actions has great significance as it incorporates wide-ranging applications, including surveillance, robotics, and personal health monitoring. Over the past few years, many computer vision-based methods have been developed for recognizing human actions from RGB and depth camera videos. These methods include space–time trajectory, motion encoding, key poses extraction, space–time occupancy patterns, depth motion maps, and skeleton joints. However, these camera-based approaches are affected by background clutter and illumination changes and applicable to a limited field of view only. Wearable inertial sensors provide a viable solution to these challenges but are subject to several limitations such as location and orientation sensitivity. Due to the complementary trait of the data obtained from the camera and inertial sensors, the utilization of multiple sensing modalities for accurate recognition of human actions is gradually increasing. This paper presents a viable multimodal feature-level fusion approach for robust human action recognition, which utilizes data from multiple sensors, including RGB camera, depth sensor, and wearable inertial sensors. We extracted the computationally efficient features from the data obtained from RGB-D video camera and inertial body sensors. These features include densely extracted histogram of oriented gradient (HOG) features from RGB/depth videos and statistical signal attributes from wearable sensors data. The proposed human action recognition (HAR) framework is tested on a publicly available multimodal human action dataset UTD-MHAD consisting of 27 different human actions. K-nearest neighbor and support vector machine classifiers are used for training and testing the proposed fusion model for HAR. The experimental results indicate that the proposed scheme achieves better recognition results as compared to the state of the art. The feature-level fusion of RGB and inertial sensors provides the overall best performance for the proposed system, with an accuracy rate of 97.6%.

[1]  Nasrollah Moghaddam Charkari,et al.  Survey on deep learning methods in human action recognition , 2017, IET Comput. Vis..

[2]  Nasser Kehtarnavaz,et al.  Real-time human action recognition based on depth motion maps , 2013, Journal of Real-Time Image Processing.

[3]  Amir Roshan Zamir,et al.  Action Recognition in Realistic Sports Videos , 2014 .

[4]  Xiaolin Wang,et al.  Human action recognition based on quaternion spatial-temporal convolutional neural network and LSTM in RGB videos , 2018, Multimedia Tools and Applications.

[5]  Ramakant Nevatia,et al.  3D Human Action Recognition Using Spatio-temporal Motion Templates , 2005, ICCV-HCI.

[6]  Paul J. M. Havinga,et al.  Fusion of Smartphone Motion Sensors for Physical Activity Recognition , 2014, Sensors.

[7]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[8]  Limin Wang,et al.  Temporal Segment Networks for Action Recognition in Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Geonho Cha,et al.  Multi-modal human action recognition using deep neural networks fusing image and inertial sensor data , 2017, 2017 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI).

[10]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[11]  Bin Sheng,et al.  Deep Convolutional Neural Networks for Human Action Recognition Using Depth Maps and Postures , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[12]  Pichao Wang,et al.  Action Recognition Based on Joint Trajectory Maps with Convolutional Neural Networks , 2018, Knowl. Based Syst..

[13]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Arun Ross,et al.  Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[15]  Montri Karnjanadecha,et al.  Model-based human action recognition , 2010, International Conference on Digital Image Processing.

[16]  Muhammad Haroon Yousaf,et al.  PMHI: Proposals From Motion History Images for Temporal Segmentation of Long Uncut Videos , 2018, IEEE Signal Processing Letters.

[17]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[18]  Limin Wang,et al.  Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice , 2014, Comput. Vis. Image Underst..

[19]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[20]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[21]  Ahmad Almogren,et al.  Human Activity Recognition from Body Sensor Data using Deep Learning , 2018, Journal of Medical Systems.

[22]  Lihua Yue,et al.  Autoencoder-based Feature Learning from a 2D Depth Map and 3D Skeleton for Action Recognition , 2018 .

[23]  Nasser Kehtarnavaz,et al.  Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors , 2015, IEEE Transactions on Human-Machine Systems.

[24]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Sergio Escalera,et al.  RGB-D-based Human Motion Recognition with Deep Learning: A Survey , 2017, Comput. Vis. Image Underst..

[26]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Hanghang Tong,et al.  Activity recognition with smartphone sensors , 2014 .

[28]  Miguel A. Labrador,et al.  A Survey on Human Activity Recognition using Wearable Sensors , 2013, IEEE Communications Surveys & Tutorials.

[29]  Nasser Kehtarnavaz,et al.  Action Recognition from Depth Sequences Using Depth Motion Maps-Based Local Binary Patterns , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[30]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[31]  Paul J. M. Havinga,et al.  Towards detection of bad habits by fusing smartphone and smartwatch sensors , 2015, 2015 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops).

[32]  Ling Shao,et al.  Learning Discriminative Key Poses for Action Recognition , 2013, IEEE Transactions on Cybernetics.

[33]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[34]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[35]  Xiaodong Yang,et al.  Super Normal Vector for Human Activity Recognition with Depth Cameras , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Isabelle Bülthoff,et al.  Action recognition in the visual periphery. , 2017, Journal of vision.

[37]  Wei-Yun Yau,et al.  Human Action Recognition With Video Data: Research and Evaluation Challenges , 2014, IEEE Transactions on Human-Machine Systems.

[38]  Pichao Wang,et al.  Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[39]  Huafeng Chen,et al.  Action recognition with gradient boundary convolutional network , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[40]  Louahdi Khoudour,et al.  Exploiting deep residual networks for human action recognition from skeletal data , 2018, Comput. Vis. Image Underst..

[41]  Nicu Sebe,et al.  Realtime Video Classification using Dense HOF/HOG , 2014, ICMR.

[42]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[43]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Di Wu,et al.  Recent advances in video-based human action recognition using deep learning: A review , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[45]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[46]  Guangchun Cheng,et al.  Advances in Human Action Recognition: A Survey , 2015, ArXiv.

[47]  Xiao Li,et al.  A Crowdsourcing Solution for Road Surface Roughness Detection Using Smartphones , 2014 .

[48]  Nasser Kehtarnavaz,et al.  Data Augmentation in Deep Learning-Based Fusion of Depth and Inertial Sensing for Action Recognition , 2019, IEEE Sensors Letters.

[49]  Nasser Kehtarnavaz,et al.  Action Detection and Recognition in Continuous Action Streams by Deep Learning-Based Sensing Fusion , 2018, IEEE Sensors Journal.

[50]  Ahmad Almogren,et al.  A robust human activity recognition system using smartphone sensors and deep learning , 2018, Future Gener. Comput. Syst..

[51]  Juan José Pantrigo,et al.  Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition , 2018, Pattern Recognit..

[52]  Max Q.-H. Meng,et al.  A Gait Recognition Method for Human Following in Service Robots , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[53]  Nasser Kehtarnavaz,et al.  Fusion of depth, skeleton, and inertial data for human action recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[54]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[55]  Muhammad Usman Ilyas,et al.  Activity recognition using smartphone sensors , 2013, 2013 IEEE 10th Consumer Communications and Networking Conference (CCNC).

[56]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[57]  Jian-Huang Lai,et al.  Deep Bilinear Learning for RGB-D Action Recognition , 2018, ECCV.

[58]  Tinne Tuytelaars,et al.  Rank Pooling for Action Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Nasser Kehtarnavaz,et al.  UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[60]  Paul J. M. Havinga,et al.  Complex Human Activity Recognition Using Smartphone and Wrist-Worn Motion Sensors , 2016, Sensors.

[61]  R. Venkatesh Babu,et al.  Human action recognition using depth maps , 2012, 2012 International Conference on Signal Processing and Communications (SPCOM).

[62]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[63]  Reza Malekian,et al.  Physical Activity Recognition From Smartphone Accelerometer Data for User Context Awareness Sensing , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[64]  Georgios Evangelidis,et al.  Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[65]  Nasser Kehtarnavaz,et al.  A Real-Time Human Action Recognition System Using Depth and Inertial Sensor Fusion , 2016, IEEE Sensors Journal.

[66]  Haiqiang Liu,et al.  Hard Sample Mining and Learning for Skeleton-Based Human Action Recognition and Identification , 2019, IEEE Access.

[67]  Muhammad Haroon Yousaf,et al.  Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description , 2016, IET Comput. Vis..

[68]  Nanning Zheng,et al.  A Limb-Based Graphical Model for Human Pose Estimation , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[69]  Mohamed Atri,et al.  Human action recognition using RGB data , 2016, 2016 11th International Design & Test Symposium (IDT).

[70]  Jonathan Loo,et al.  Continuous authentication of smartphone users based on activity pattern recognition using passive mobile sensing , 2018, J. Netw. Comput. Appl..

[71]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.