Human Activity Detection from RGBD Images

Being able to detect and recognize human activities is important for making personal assistant robots useful in performing assistive tasks. The challenge is to develop a system that is low-cost, reliable in unstructured home settings, and also straightforward to use. In this paper, we use a RGBD sensor (Microsoft Kinect) as the input sensor, and present learning algorithms to infer the activities. Our algorithm is based on a hierarchical maximum entropy Markov model (MEMM). It considers a person's activity as composed of a set of sub-activities, and infers the two-layered graph structure using a dynamic programming approach. We test our algorithm on detecting and recognizing twelve different activities performed by four people in different environments, such as a kitchen, a living room, an office, etc., and achieve an average performance of 84.3% when the person was seen before in the training set (and 64.2% when the person was not seen before).

[1]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[3]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[4]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Henry A. Kautz,et al.  Inferring activities from interactions with objects , 2004, IEEE Pervasive Computing.

[6]  Svetha Venkatesh,et al.  Learning Hierarchical Hidden Markov Models with General State Hierarchy , 2004, AAAI.

[7]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  D. Feil-Seifer,et al.  Defining socially assistive robotics , 2005, 9th International Conference on Rehabilitation Robotics, 2005. ICORR 2005..

[9]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[12]  Henry A. Kautz,et al.  Fine-grained activity recognition by aggregating abstract object usage , 2005, Ninth IEEE International Symposium on Wearable Computers (ISWC'05).

[13]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Henry A. Kautz,et al.  Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields , 2007, Int. J. Robotics Res..

[15]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Goro Obinata,et al.  Vision Systems: Segmentation and Pattern Recognition , 2007 .

[17]  Lawson L. S. Wong,et al.  A Vision-Based System for Grasping Novel Objects in Cluttered Environments , 2007, ISRR.

[18]  Tae-Kyun Kim,et al.  Learning Motion Categories using both Semantic and Structural Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  James M. Rehg,et al.  A Scalable Approach to Activity Recognition based on Object Use , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22]  Manuel Lopes,et al.  Affordance-based imitation learning in robots , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Adriana Tapus,et al.  User—robot personality matching and assistive robot behavior adaptation for post-stroke rehabilitation therapy , 2008, Intell. Serv. Robotics.

[24]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Eric Campo,et al.  A review of smart homes - Present state and future challenges , 2008, Comput. Methods Programs Biomed..

[26]  Huosheng Hu,et al.  Ubiquitous robotics in physical human action recognition: A comparison between dynamic ANNs and GP , 2008, 2008 IEEE International Conference on Robotics and Automation.

[27]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[28]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Truyen Tran,et al.  Hierarchical semi-Markov conditional random fields for deep recursive sequential data , 2008, Artif. Intell..

[30]  A. Meltzoff,et al.  The Robot in the Crib: A Developmental Analysis of Imitation Skills in Infants and Robots. , 2008, Infant and child development.

[31]  H. Nguyen El-E: An Assistive Robot that Fetches Objects from Flat Surfaces , 2008 .

[32]  Ashutosh Saxena,et al.  Learning 3-D object orientation from images , 2009, 2009 IEEE International Conference on Robotics and Automation.

[33]  Sergio A. Velastin,et al.  Recognizing Human Actions Using Silhouette-based HMM , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[34]  Larry S. Davis,et al.  Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Ming Liu,et al.  Hierarchical Space-Time Model Enabling Efficient Search for Human Actions , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Congcong Li,et al.  FeCCM for scene understanding: Helping the robot to learn multiple tasks , 2011, 2011 IEEE International Conference on Robotics and Automation.

[37]  Yun Jiang,et al.  Learning to place new objects , 2011, 2012 IEEE International Conference on Robotics and Automation.