Human Activity Recognition for Domestic Robots

Capabilities of domestic service robots could be further improved, if the robot is equipped with an ability to recognize activities performed by humans in its sensory range. For example in a simple scenario a floor cleaning robot can vacuum the kitchen floor after recognizing human activity ”cooking in the kitchen”. Most of the complex human activities can be sub divided into simple activities which can later used for recognize complex activities. Activities like ”take meditation” can be sub divided into simple activities like ”opening pill container” and ”drinking water”. However, even recognizing simple activities are highly challenging due to the similarities between some inter activities and dissimilarities of intra activities which are performed by different people, body poses and orientations. Even a simple human activity like ”drinking water” can be performed while the subject is in different body poses like sitting, standing or walking. Therefore building machine learning techniques to recognize human activities with such complexities is non trivial. To address this issue, we propose a human activity recognition technique that uses 3D skeleton features produced by a depth camera. The algorithm incorporates importance weights for skeleton 3D joints according to the activity being performed. This allows the algorithm to ignore the confusing or irrelevant features while relying on informative features. Later these joints were ensembled together to train Dynamic Bayesian Networks (DBN), which is then used to infer human activities based on likelihoods. The proposed activity recognition technique is tested on a publicly available dataset and UTS experiments with overall accuracies of 85% and 90%.

[1]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[2]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Bart Selman,et al.  Human Activity Detection from RGBD Images , 2011, Plan, Activity, and Intent Recognition.

[4]  Kai Oliver Arras,et al.  Audio-based human activity recognition using Non-Markovian Ensemble Voting , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[5]  A. Meltzoff,et al.  The Robot in the Crib: A Developmental Analysis of Imitation Skills in Infants and Robots. , 2008, Infant and child development.

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Huosheng Hu,et al.  Ubiquitous robotics in physical human action recognition: A comparison between dynamic ANNs and GP , 2008, 2008 IEEE International Conference on Robotics and Automation.

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Manuel Lopes,et al.  Affordance-based imitation learning in robots , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Lasitha Piyathilaka,et al.  Gaussian mixture based HMM for human daily activity recognition using 3D skeleton features , 2013, 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA).

[11]  Kevin P. Murphy,et al.  A coupled HMM for audio-visual speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Sergio A. Velastin,et al.  Recognizing Human Actions Using Silhouette-based HMM , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[13]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[14]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[15]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Kevin P. Murphy,et al.  Dynamic Bayesian Networks for Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..