Simultaneous Utilization of Inertial and Video Sensing for Action Detection and Recognition in Continuous Action Streams

This paper describes the simultaneous utilization of inertial and video sensing for the purpose of achieving human action detection and recognition in continuous action streams. Continuous action streams mean that actions of interest are performed randomly among actions of non-interest in a continuous manner. The inertial and video data are captured simultaneously via a wearable inertial sensor and a video camera, which are turned into 2D and 3D images. These images are then fed into a 2D and a 3D convolutional neural network with their decisions fused in order to detect and recognize a specified set of actions of interest from continuous action streams. The developed fusion approach is applied to two sets of actions of interest consisting of smart TV gestures and sports actions. The results obtained indicate the fusion approach is more effective than when each sensing modality is used individually. The average accuracy of the fusion approach is found to be 5.8% above inertial and 14.3% above video for the TV gesture actions of interest, and 23.2% above inertial and 1.9% above video for the sports actions of interest.

[1]  Nasser Kehtarnavaz,et al.  Action Detection and Recognition in Continuous Action Streams by Deep Learning-Based Sensing Fusion , 2018, IEEE Sensors Journal.

[2]  Allen Y. Yang,et al.  Distributed recognition of human actions using wearable motion sensor networks , 2009, J. Ambient Intell. Smart Environ..

[3]  Philippe Fraisse,et al.  Automatic Fall Detection and Activity Monitoring for Elderly , 2007 .

[4]  Inês Sousa,et al.  Wearable Embedded Intelligence for Detection of Falls Independently of on-Body Location , 2019, Sensors.

[5]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Nasser Kehtarnavaz,et al.  Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors , 2015, IEEE Transactions on Human-Machine Systems.

[7]  Hyo-Rim Choi,et al.  A Differential Evolution Approach to Optimize Weights of Dynamic Time Warping for Multi-Sensor Based Gesture Recognition , 2019, Sensors.

[8]  Robert Bergevin,et al.  Semantic human activity recognition: A literature review , 2015, Pattern Recognit..

[9]  Dimitrios Makris,et al.  G3D: A gaming action dataset and real time action recognition evaluation framework , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Song Zheng,et al.  An Improved Moving Object Detection Algorithm Based on Frame Difference and Edge Detection , 2007, Fourth International Conference on Image and Graphics (ICIG 2007).

[11]  Yong Wang,et al.  Using human body gestures as inputs for gaming via depth analysis , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[12]  Nasser Kehtarnavaz,et al.  A medication adherence monitoring system for pill bottles based on a wearable inertial sensor , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[13]  N. Kehtarnavaz,et al.  Deep Learning-Based Person Detection and Classification for Far Field Video Surveillance , 2018, 2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS).

[14]  Nasser Kehtarnavaz,et al.  UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[15]  Nasser Kehtarnavaz,et al.  Home-based Senior Fitness Test measurement system using collaborative inertial and depth sensors , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[16]  Nasser Kehtarnavaz,et al.  Fusion of Video and Inertial Sensing for Deep Learning–Based Human Action Recognition , 2019, Sensors.

[17]  Juan Villegas-Cortez,et al.  Coarse-Fine Convolutional Deep-Learning Strategy for Human Activity Recognition , 2019, Sensors.

[18]  Sebastian Schmitz,et al.  Real-Time Gesture Recognition using a Particle Filtering Approach , 2017, ICPRAM.

[19]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[20]  Nasser Kehtarnavaz,et al.  Real-Time Continuous Detection and Recognition of Subject-Specific Smart TV Gestures via Fusion of Depth and Inertial Sensing , 2018, IEEE Access.

[21]  Roozbeh Jafari,et al.  Orientation Independent Activity/Gesture Recognition Using Wearable Motion Sensors , 2019, IEEE Internet of Things Journal.

[22]  Nanning Zheng,et al.  Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network , 2018, Sensors.

[23]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Hongnian Yu,et al.  Elderly activities recognition and classification for applications in assisted living , 2013, Expert Syst. Appl..

[25]  Roozbeh Jafari,et al.  A Survey on Smart Homes for Aging in Place: Toward Solutions to the Specific Needs of the Elderly , 2018, IEEE Signal Processing Magazine.

[26]  Nasser Kehtarnavaz,et al.  A Real-Time Human Action Recognition System Using Depth and Inertial Sensor Fusion , 2016, IEEE Sensors Journal.

[27]  Nasser Kehtarnavaz,et al.  Continuous detection and recognition of actions of interest among actions of non-interest using a depth camera , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[28]  Marco La Cascia,et al.  3D skeleton-based human action classification: A survey , 2016, Pattern Recognit..

[29]  Nasser Kehtarnavaz,et al.  Semi-Supervised Faster RCNN-Based Person Detection and Load Classification for Far Field Video Surveillance , 2019, Mach. Learn. Knowl. Extr..

[30]  Nasser Kehtarnavaz,et al.  A Convolutional Neural Network-Based Sensor Fusion System for Monitoring Transition Movements in Healthcare Applications , 2018, 2018 IEEE 14th International Conference on Control and Automation (ICCA).

[31]  Nasser Kehtarnavaz,et al.  A survey of depth and inertial sensor fusion for human action recognition , 2015, Multimedia Tools and Applications.

[32]  Mignon Park,et al.  Continuous Human Action Recognition Using Depth-MHI-HOG and a Spotter Model , 2015, Sensors.

[33]  Zhen Wang,et al.  uWave: Accelerometer-based Personalized Gesture Recognition and Its Applications , 2009, PerCom.