Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors

This paper presents a fusion approach for improving human action recognition based on two differing modality sensors consisting of a depth camera and an inertial body sensor. Computationally efficient action features are extracted from depth images provided by the depth camera and from accelerometer signals provided by the inertial body sensor. These features consist of depth motion maps and statistical signal attributes. For action recognition, both feature-level fusion and decision-level fusion are examined by using a collaborative representation classifier. In the feature-level fusion, features generated from the two differing modality sensors are merged before classification, while in the decision-level fusion, the Dempster-Shafer theory is used to combine the classification outcomes from two classifiers, each corresponding to one sensor. The introduced fusion framework is evaluated using the Berkeley multimodal human action database. The results indicate that because of the complementary aspect of the data from these sensors, the introduced fusion approaches lead to 2% to 23% recognition rate improvements depending on the action over the situations when each sensor is used individually.

[1]  Tae-Seong Kim,et al.  Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home , 2012, IEEE Transactions on Consumer Electronics.

[2]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[3]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[4]  Andrés Pérez-Uribe,et al.  Indoor Activity Recognition by Combining One-vs.-All Neural Network Classifiers Exploiting Wearable and Depth Sensors , 2013, IWANN.

[5]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Nasser Kehtarnavaz,et al.  Home-based Senior Fitness Test measurement system using collaborative inertial and depth sensors , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[7]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[8]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[9]  Mi Zhang,et al.  Human Daily Activity Recognition With Sparse Representation Using Wearable Sensors , 2013, IEEE Journal of Biomedical and Health Informatics.

[10]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[11]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Chen Chen,et al.  Compressed-sensing recovery of images and video using multihypothesis predictions , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[14]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Nasser Kehtarnavaz,et al.  Fusion of Inertial and Depth Sensor Data for Robust Hand Gesture Recognition , 2014, IEEE Sensors Journal.

[16]  A. Rosenfeld,et al.  IEEE TRANSACTIONS ON SYSTEMS , MAN , AND CYBERNETICS , 2022 .

[17]  Katsunori Ikoma,et al.  Obituary: Yukio Mano (1943–2004) , 2005, Journal of NeuroEngineering and Rehabilitation.

[18]  Thierry Denoeux,et al.  An evidence-theoretic k-NN rule with parameter optimization , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[19]  Aleksandar Milenkovic,et al.  Journal of Neuroengineering and Rehabilitation Open Access a Wireless Body Area Network of Intelligent Motion Sensors for Computer Assisted Physical Rehabilitation , 2005 .

[20]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[21]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[22]  Yao-Jen Chang,et al.  A Kinect-based system for physical rehabilitation: a pilot study for young adults with motor disabilities. , 2011, Research in developmental disabilities.

[23]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[25]  Bogdan Kwolek,et al.  Fuzzy Inference-Based Reliable Fall Detection Using Kinect and Accelerometer , 2012, ICAISC.

[26]  Chen Chen,et al.  Single-image super-resolution using multihypothesis prediction , 2012, 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[27]  Thierry Denoeux A k -Nearest Neighbor Classification Rule Based on Dempster-Shafer Theory , 2008, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[28]  Lei Zhang,et al.  Sparse representation or collaborative representation: Which helps face recognition? , 2011, 2011 International Conference on Computer Vision.

[29]  Nasser Kehtarnavaz,et al.  Real-time human action recognition based on depth motion maps , 2016, Journal of Real-Time Image Processing.

[30]  R. Bajcsy,et al.  Wearable Sensors for Reliable Fall Detection , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[31]  Ilkka Korhonen,et al.  Detection of Daily Activities and Sports With Wearable Sensors in Controlled and Uncontrolled Conditions , 2008, IEEE Transactions on Information Technology in Biomedicine.

[32]  Chen Chen,et al.  Reconstruction of Hyperspectral Imagery From Random Projections Using Multihypothesis Prediction , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[33]  Yue Min Zhu,et al.  Study of Dempster-Shafer theory for image segmentation applications , 2002, Image Vis. Comput..

[34]  Cristina Videira Lopes,et al.  Monitoring Intake Gestures using Sensor Fusion (Microsoft Kinect and Inertial Sensors) for Smart Hom , 2012 .

[35]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[36]  Nasser Kehtarnavaz,et al.  A medication adherence monitoring system for pill bottles based on a wearable inertial sensor , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[37]  Fakhri Karray,et al.  Connectionist-based Dempster-Shafer evidential reasoning for data fusion , 2005, IEEE Transactions on Neural Networks.

[38]  Tae-Seong Kim,et al.  A Triaxial Accelerometer-Based Physical-Activity Recognition via Augmented-Signal Features and a Hierarchical Recognizer , 2010, IEEE Transactions on Information Technology in Biomedicine.