Complexity-Aware Assignment of Latent Values in Discriminative Models for Accurate Gesture Recognition

Many of the state-of-the-art algorithms for gesture recognition are based on Conditional Random Fields (CRFs). Successful approaches, such as the Latent-Dynamic CRFs, extend the CRF by incorporating latent variables, whose values are mapped to the values of the labels. In this paper we propose a novel methodology to set the latent values according to the gesture complexity. We use an heuristic that iterates through the samples associated with each label value, estimating their complexity. We then use it to assign the latent values to the label values. We evaluate our method on the task of recognizing human gestures from video streams. The experiments were performed in binary datasets, generated by grouping different labels. Our results demonstrate that our approach outperforms the arbitrary one in many cases, increasing the accuracy by up to 10%.

[1]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[2]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[3]  Li-Chen Fu,et al.  Active-learning assisted self-reconfigurable activity recognition in a dynamic environment , 2009, 2009 IEEE International Conference on Robotics and Automation.

[4]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Thales Vieira,et al.  High performance moves recognition and sequence segmentation based on key poses filtering , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[7]  Sven Behnke,et al.  PyStruct: learning structured prediction in python , 2014, J. Mach. Learn. Res..

[8]  Yale Song,et al.  Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database , 2011, Face and Gesture 2011.

[9]  Kirsti Grobel,et al.  Video-Based Sign Language Recognition Using Hidden Markov Models , 1997, Gesture Workshop.

[10]  Gwenn Englebienne,et al.  A two-layered approach to recognize high-level human activities , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[11]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Gwenn Englebienne,et al.  Learning latent structure for activity recognition , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Yale Song,et al.  Multi-view latent variable discriminative models for action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[17]  T. Kobayashi,et al.  A conversation robot using head gesture recognition as para-linguistic information , 2004, RO-MAN 2004. 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759).

[18]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.