Minimum uncertainty latent variable models for robot recognition of sequential human activities

Recognition of sequential human activities, such as “sitting down” and “standing up”, is a common but challenging problem in human-robot interaction, which requires modeling their underlying temporal patterns. Although previous sequence modeling methods, such as Hidden Conditional Random Fields (HCRFs), demonstrated satisfactory recognition accuracy, they do not explicitly model the uncertainty in underlying temporal patterns, which can provide valuable information to characterize sequential activities. To address this problem, we introduce a novel Minimum Uncertainty HCRF (MU, or μHCRF). Different from traditional HCRF-based techniques that only utilize the negative log-likelihood of the categories' conditional probability as the loss function, the proposed μ-HCRF also introduces a regularization term to model the underlying temporal pattern of the latent variables. As another theoretical contribution, we provide a derivation to show that the formulated problem has a closed-form solution, and prove that inference of the proposed μHCRF is tractable. Extensive empirical study is performed to evaluate our approach, using four public benchmark datasets. Experimental results have shown that our μHCRFs outperform previous techniques and achieve state-of-the-art performance on human activity recognition, especially on sequential activities.

[1]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[3]  Lynne E. Parker,et al.  Adaptive human-centered representation for activity recognition of multiple individuals from 3D point cloud sequences , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Richard P. Wildes,et al.  Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  G. Klir Uncertainty and Information: Foundations of Generalized Information Theory , 2005 .

[6]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[7]  Matthew Brand Structure and parameter learning via entropy minimization, with applications to mixture and hidden Markov models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Yang Wang,et al.  Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Junsong Yuan,et al.  Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Yale Song,et al.  Multi-signal gesture recognition using temporal smoothing hidden conditional random fields , 2011, Face and Gesture 2011.

[12]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Patrick Bouthemy,et al.  Better Exploiting Motion for Better Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[15]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[16]  Sören Sonnenburg,et al.  Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization , 2009, J. Mach. Learn. Res..

[17]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[18]  Dong Han,et al.  Selection and context for action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Thomas B. Moeslund,et al.  Selective spatio-temporal interest points , 2012, Comput. Vis. Image Underst..

[20]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[21]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[22]  Qiang Ji,et al.  Knowledge Based Activity Recognition with Dynamic Bayesian Network , 2010, ECCV.

[23]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Shang-Hong Lai,et al.  Learning partially-observed hidden conditional random fields for facial expression recognition , 2009, CVPR.

[25]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Ivan Laptev,et al.  Improving bag-of-features action recognition with non-local cues , 2010, BMVC.

[27]  Daphne Koller,et al.  Modeling Latent Variable Uncertainty for Loss-based Learning , 2012, ICML.

[28]  Thierry Artières,et al.  Regularized bundle methods for convex and non-convex risks , 2012, J. Mach. Learn. Res..

[29]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[31]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[32]  Lasitha Piyathilaka,et al.  Gaussian mixture based HMM for human daily activity recognition using 3D skeleton features , 2013, 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA).

[33]  Yale Song,et al.  Action Recognition by Hierarchical Sequence Summarization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[35]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[36]  Andrew Gilbert,et al.  Fast realistic multi-action recognition using mined dense spatio-temporal features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  K. R. Ramakrishnan,et al.  A Cause and Effect Analysis of Motion Trajectories for Modeling Actions , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[39]  Bingbing Ni,et al.  Order-Preserving Sparse Coding for Sequence Classification , 2012, ECCV.

[40]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.