Learning to Recognize Human Activities Using Soft Labels

Human activity recognition system is of great importance in robot-care scenarios. Typically, training such a system requires activity labels to be both completely and accurately annotated. In this paper, we go beyond such restriction and propose a learning method that allow labels to be incomplete and uncertain. We introduce the idea of soft labels which allows annotators to assign multiple, and weighted labels to data segments. This is very useful in many situations, e.g., when the labels are uncertain, when part of the labels are missing, or when multiple annotators assign inconsistent labels. We formulate the activity recognition task as a sequential labeling problem. Latent variables are embedded in the model in order to exploit sub-level semantics for better estimation. We propose a max-margin framework which incorporate soft labels for learning the model parameters. The model is evaluated on two challenging datasets. To simulate the uncertainty in data annotation, we randomly change the labels for transition segments. The results show significant improvement over the state-of-the-art approach.

[1]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[2]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Hema Swetha Koppula,et al.  Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation , 2013, ICML.

[4]  Gwenn Englebienne,et al.  A two-layered approach to recognize high-level human activities , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[5]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[6]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[7]  Christian Heath,et al.  IEEE International Symposium on Robot and Human Interactive Communication , 2009 .

[8]  Yang Wang,et al.  Max-margin hidden conditional random fields for human action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Matthai Philipose,et al.  Unsupervised Activity Recognition Using Automatically Mined Common Sense , 2005, AAAI.

[10]  Yun Jiang,et al.  Infinite Latent Conditional Random Fields for Modeling Environments through Humans , 2013, Robotics: Science and Systems.

[11]  Bart Selman,et al.  Human Activity Detection from RGBD Images , 2011, Plan, Activity, and Intent Recognition.

[12]  Greg Mori,et al.  Handling Uncertain Tags in Visual Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Gwenn Englebienne,et al.  Learning latent structure for activity recognition , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Yang Wang,et al.  Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[16]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[17]  Fred A. Hamprecht,et al.  Structured Learning from Partial Annotations , 2012, ICML.

[18]  Gwenn Englebienne,et al.  Accurate activity recognition in a home setting , 2008, UbiComp.

[19]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[20]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[21]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[22]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[23]  Joris M. Mooij,et al.  libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[24]  Hema Swetha Koppula,et al.  Anticipating human activities for reactive robotic response , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Manuela M. Veloso,et al.  Conditional random fields for activity recognition , 2007, AAMAS '07.

[26]  Fei-Fei Li,et al.  Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.