UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor

Human action recognition has a wide range of applications including biometrics, surveillance, and human computer interaction. The use of multimodal sensors for human action recognition is steadily increasing. However, there are limited publicly available datasets where depth camera and inertial sensor data are captured at the same time. This paper describes a freely available dataset, named UTD-MHAD, which consists of four temporally synchronized data modalities. These modalities include RGB videos, depth videos, skeleton positions, and inertial signals from a Kinect camera and a wearable inertial sensor for a comprehensive set of 27 human actions. Experimental results are provided to show how this database can be used to study fusion approaches that involve using both depth camera data and inertial sensor data. This public domain dataset is of benefit to multimodality research activities being conducted for human action recognition by various research groups.

[1]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[2]  Nasser Kehtarnavaz,et al.  Multi-HMM classification for hand gesture recognition using two differing modality sensors , 2014, 2014 IEEE Dallas Circuits and Systems Conference (DCAS).

[3]  Nasser Kehtarnavaz,et al.  Action Recognition from Depth Sequences Using Depth Motion Maps-Based Local Binary Patterns , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[4]  Nasser Kehtarnavaz,et al.  Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors , 2015, IEEE Transactions on Human-Machine Systems.

[5]  Mi Zhang,et al.  USC-HAD: a daily activity dataset for ubiquitous activity recognition using wearable sensors , 2012, UbiComp.

[6]  James W. Davis Hierarchical motion history images for recognizing human motion , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[7]  Nasser Kehtarnavaz,et al.  Fusion of Inertial and Depth Sensor Data for Robust Hand Gesture Recognition , 2014, IEEE Sensors Journal.

[8]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[10]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[11]  Lei Zhang,et al.  Sparse representation or collaborative representation: Which helps face recognition? , 2011, 2011 International Conference on Computer Vision.

[12]  Nasser Kehtarnavaz,et al.  Real-time human action recognition based on depth motion maps , 2016, Journal of Real-Time Image Processing.

[13]  Nasser Kehtarnavaz,et al.  Home-based Senior Fitness Test measurement system using collaborative inertial and depth sensors , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[14]  Nasser Kehtarnavaz,et al.  A medication adherence monitoring system for pill bottles based on a wearable inertial sensor , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[15]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[16]  Aleksandar Milenkovic,et al.  Journal of Neuroengineering and Rehabilitation Open Access a Wireless Body Area Network of Intelligent Motion Sensors for Computer Assisted Physical Rehabilitation , 2005 .

[17]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[18]  Hassan Ghasemzadeh,et al.  Distributed Continuous Action Recognition Using a Hidden Markov Model in Body Sensor Networks , 2009, DCOSS.

[19]  Katsunori Ikoma,et al.  Obituary: Yukio Mano (1943–2004) , 2005, Journal of NeuroEngineering and Rehabilitation.

[20]  Bogdan Kwolek,et al.  Fall detection using ceiling-mounted 3D depth camera , 2015, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[21]  Allen Y. Yang,et al.  Distributed recognition of human actions using wearable motion sensor networks , 2009, J. Ambient Intell. Smart Environ..