A survey of depth and inertial sensor fusion for human action recognition

A number of review or survey articles have previously appeared on human action recognition where either vision sensors or inertial sensors are used individually. Considering that each sensor modality has its own limitations, in a number of previously published papers, it has been shown that the fusion of vision and inertial sensor data improves the accuracy of recognition. This survey article provides an overview of the recent investigations where both vision and inertial sensors are used together and simultaneously to perform human action recognition more effectively. The thrust of this survey is on the utilization of depth cameras and inertial sensors as these two types of sensors are cost-effective, commercially available, and more significantly they both provide 3D human action data. An overview of the components necessary to achieve fusion of data from depth and inertial sensors is provided. In addition, a review of the publicly available datasets that include depth and inertial data which are simultaneously captured via depth and inertial sensors is presented.

[1]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Ilkka Korhonen,et al.  Detection of Daily Activities and Sports With Wearable Sensors in Controlled and Uncontrolled Conditions , 2008, IEEE Transactions on Information Technology in Biomedicine.

[3]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[5]  Nasser Kehtarnavaz,et al.  Multi-HMM classification for hand gesture recognition using two differing modality sensors , 2014, 2014 IEEE Dallas Circuits and Systems Conference (DCAS).

[6]  Hong Wei,et al.  A survey of human motion analysis using depth imagery , 2013, Pattern Recognit. Lett..

[7]  Noel E. O'Connor,et al.  Low-cost accurate skeleton tracking based on fusion of kinect and wearable inertial sensors , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[8]  Bingbing Ni,et al.  RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, ICCV Workshops.

[9]  Hans-Peter Seidel,et al.  Real-Time Body Tracking with One Depth Camera and Inertial Sensors , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Ennio Gambi,et al.  A Depth-Based Fall Detection System Using a Kinect® Sensor , 2014, Sensors.

[11]  Chen Feng,et al.  Upper limb motion tracking with the integration of IMU and Kinect , 2015, Neurocomputing.

[12]  Paul J. M. Havinga,et al.  Activity Recognition Using Inertial Sensing for Healthcare, Wellbeing and Sports Applications: A Survey , 2010, ARCS Workshops.

[13]  Allen Y. Yang,et al.  Distributed recognition of human actions using wearable motion sensor networks , 2009, J. Ambient Intell. Smart Environ..

[14]  Nasser Kehtarnavaz,et al.  A medication adherence monitoring system for pill bottles based on a wearable inertial sensor , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[15]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[16]  Ying Yin,et al.  Gesture spotting and recognition using salience detection and concatenated hidden markov models , 2013, ICMI '13.

[17]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[18]  Kiyoharu Aizawa,et al.  Action recognition using invariant features under unexampled viewing conditions , 2013, MM '13.

[19]  Tinghuai Ma,et al.  Review of Sensor-based Activity Recognition Systems , 2011 .

[20]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[21]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[22]  Bogdan Kwolek,et al.  Improving fall detection by the use of depth sensor and accelerometer , 2015, Neurocomputing.

[23]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[24]  Nasser Kehtarnavaz,et al.  UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[25]  Ennio Gambi,et al.  Time synchronization and data fusion for RGB-Depth cameras and inertial sensors in AAL applications , 2015, 2015 IEEE International Conference on Communication Workshop (ICCW).

[26]  Georgios Evangelidis,et al.  Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[27]  Nasser Kehtarnavaz,et al.  A Real-Time Human Action Recognition System Using Depth and Inertial Sensor Fusion , 2016, IEEE Sensors Journal.

[28]  Stephen J. McKenna,et al.  Combining embedded accelerometers with computer vision for recognizing food preparation activities , 2013, UbiComp.

[29]  Bin Gu,et al.  Incremental Support Vector Learning for Ordinal Regression , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Qing Zhang,et al.  A Survey on Human Motion Analysis from Depth Data , 2013, Time-of-Flight and Depth Imaging.

[32]  Roozbeh Jafari,et al.  Low power programmable architecture for periodic activity monitoring , 2013, 2013 ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS).

[33]  Nasser Kehtarnavaz,et al.  Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors , 2015, IEEE Transactions on Human-Machine Systems.

[34]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Nasser Kehtarnavaz,et al.  Fusion of Inertial and Depth Sensor Data for Robust Hand Gesture Recognition , 2014, IEEE Sensors Journal.

[36]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[37]  Aleksandar Milenkovic,et al.  Journal of Neuroengineering and Rehabilitation Open Access a Wireless Body Area Network of Intelligent Motion Sensors for Computer Assisted Physical Rehabilitation , 2005 .

[38]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[40]  Srinivas Akella,et al.  3D human action segmentation and recognition using pose kinetic energy , 2014, 2014 IEEE International Workshop on Advanced Robotics and its Social Impacts.

[41]  Yifan Zhang,et al.  Multi-modal learning for gesture recognition , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[42]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[44]  Gang Zhou,et al.  Accurate, Fast Fall Detection Using Gyroscopes and Accelerometer-Derived Posture Information , 2009, 2009 Sixth International Workshop on Wearable and Implantable Body Sensor Networks.

[45]  Miguel A. Labrador,et al.  A Survey on Human Activity Recognition using Wearable Sensors , 2013, IEEE Communications Surveys & Tutorials.

[46]  Nasser Kehtarnavaz,et al.  Action Recognition from Depth Sequences Using Depth Motion Maps-Based Local Binary Patterns , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[47]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[48]  Andrés Pérez-Uribe,et al.  Indoor Activity Recognition by Combining One-vs.-All Neural Network Classifiers Exploiting Wearable and Depth Sensors , 2013, IWANN.

[49]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Maria Petrou,et al.  Photometric stereo with an arbitrary number of illuminants , 2010, Comput. Vis. Image Underst..

[52]  Bernt Schiele,et al.  A tutorial on human activity recognition using body-worn inertial sensors , 2014, CSUR.

[53]  Jesse Hoey,et al.  Sensor-Based Activity Recognition , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[54]  Elena Mugellini,et al.  ChAirGest: a challenge for multimodal mid-air gesture recognition for close HCI , 2013, ICMI '13.

[55]  Wei-Yun Yau,et al.  Human Action Recognition With Video Data: Research and Evaluation Challenges , 2014, IEEE Transactions on Human-Machine Systems.

[56]  Nasser Kehtarnavaz,et al.  Home-based Senior Fitness Test measurement system using collaborative inertial and depth sensors , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[57]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[58]  Jian Cheng,et al.  Bayesian Co-Boosting for Multi-modal Gesture Recognition , 2014, Gesture Recognition.

[59]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[60]  KwolekBogdan,et al.  Human fall detection on embedded platform using depth maps and wireless accelerometer , 2014 .

[61]  Huosheng Hu,et al.  Ubiquitous robotics in physical human action recognition: A comparison between dynamic ANNs and GP , 2008, 2008 IEEE International Conference on Robotics and Automation.

[62]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..

[63]  Allen Y. Yang,et al.  Distributed segmentation and classification of human actions using a wearable motion sensor network , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[64]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[66]  Yuxiang Wang,et al.  Construction of Tree Network with Limited Delivery Latency in Homogeneous Wireless Sensor Networks , 2014, Wirel. Pers. Commun..

[67]  Lei Zhang,et al.  Sparse representation or collaborative representation: Which helps face recognition? , 2011, 2011 International Conference on Computer Vision.

[68]  Ennio Gambi,et al.  Proposal and Experimental Evaluation of Fall Detection Solution Based on Wearable and Depth Data Fusion , 2015, ICT Innovations.

[69]  Nasser Kehtarnavaz,et al.  Real-time human action recognition based on depth motion maps , 2016, Journal of Real-Time Image Processing.

[70]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[71]  Martial Hebert,et al.  Temporal segmentation and activity classification from first-person sensing , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[72]  Billur Barshan,et al.  Human Activity Recognition Using Inertial/Magnetic Sensor Units , 2010, HBU.

[73]  Jindong Liu,et al.  Enhanced Classification of Abnormal Gait Using BSN and Depth , 2012, 2012 Ninth International Conference on Wearable and Implantable Body Sensor Networks.

[74]  Snehasis Mukherjee,et al.  Recognizing Human Action at a Distance in Video by Key Poses , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[75]  Bogdan Kwolek,et al.  Human fall detection on embedded platform using depth maps and wireless accelerometer , 2014, Comput. Methods Programs Biomed..

[76]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.