Learning to Recognize Human Activities from Soft Labeled Data

An activity recognition system is a very important component for assistant robots, but training such a system usually requires a large and correctly labeled dataset. Most of the previous works only allow training data to have a single activity label per segment, which is overly restrictive because the labels are not always certain. It is, therefore, desirable to allow multiple labels for ambiguous segments. In this paper, we introduce the method of soft labeling, which allows annotators to assign multiple, weighted, labels to data segments. This is useful in many situations, e.g. when the labels are uncertain, when part of the labels are missing, or when multiple annotators assign inconsistent labels. We treat the activity recognition task as a sequential labeling problem. Latent variables are embedded to exploit sub-level semantics for better estimation. We propose a novel method for learning model parameters from soft-labeled data in a max-margin framework. The model is evaluated on a challenging dataset (CAD-120), which is captured by a RGBD sensor mounted on the robot. To simulate the uncertainty in data annotation, we randomly change the labels for transition segments. The results show significant improvement over the state-of-the-art approach.

[1]  Ben J. A. Kröse,et al.  Assistive technology design and development for acceptable robotics companions for ageing years , 2013, Paladyn J. Behav. Robotics.

[2]  Yang Wang,et al.  Max-margin hidden conditional random fields for human action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[4]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[5]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[6]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[7]  Alexander Verl,et al.  Care-O-bot® 3 - creating a product vision for service robot applications by integrating design and technology , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[9]  Yun Jiang,et al.  Infinite Latent Conditional Random Fields for Modeling Environments through Humans , 2013, Robotics: Science and Systems.

[10]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[11]  Matthai Philipose,et al.  Unsupervised Activity Recognition Using Automatically Mined Common Sense , 2005, AAAI.

[12]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[13]  Fei-Fei Li,et al.  Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Gwenn Englebienne,et al.  A two-layered approach to recognize high-level human activities , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[15]  Manuela M. Veloso,et al.  Conditional random fields for activity recognition , 2007, AAMAS '07.

[16]  Gwenn Englebienne,et al.  Accurate activity recognition in a home setting , 2008, UbiComp.

[17]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Joris M. Mooij,et al.  libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[19]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[20]  Hema Swetha Koppula,et al.  Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation , 2013, ICML.

[21]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[22]  Gwenn Englebienne,et al.  Posture recognition with a top-view camera , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Greg Mori,et al.  Handling Uncertain Tags in Visual Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Gwenn Englebienne,et al.  Learning latent structure for activity recognition , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Fred A. Hamprecht,et al.  Structured Learning from Partial Annotations , 2012, ICML.