A new action recognition method by distinguishing ambiguous postures

One of the most important aspects of promoting the intelligence of home service robots is to reliably recognize human actions and accurately understand human behaviors and intentions. In the task of action recognition, there are many common ambiguous postures, which affect the recognition accuracy. To improve the reliability of the service provided by home service robots, this article presents a method of probabilistic soft-assignment recognition scheme based on Gaussian mixture models to recognize similar actions. First, we generate a representative posture dictionary based on the standard bag-of-words model; then, a Gaussian mixture model is introduced for the similar poses. Finally, combined with the Naive Bayesian principle, the method of weighted voting is used to recognize the action. The proposed scheme is verified by recognizing four types of daily actions, and the experimental results show its effectiveness.

[1]  Yang Li,et al.  Augmenting bag-of-words: a robust contextual representation of spatiotemporal interest points for action recognition , 2014, The Visual Computer.

[2]  Mubarak Shah,et al.  Learning semantic features for action recognition via diffusion maps , 2012, Comput. Vis. Image Underst..

[3]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[4]  Ivan Laptev,et al.  Efficient Feature Extraction, Encoding, and Classification for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[6]  Di Guo,et al.  Structured Output-Associated Dictionary Learning for Haptic Understanding , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[7]  Di Guo,et al.  Object Recognition Using Tactile Measurements: Kernel Sparse Coding Methods , 2016, IEEE Transactions on Instrumentation and Measurement.

[8]  Carsten Rother,et al.  Weakly supervised discriminative localization and classification: a joint learning process , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[10]  Theo Gevers,et al.  Evaluation of Color STIPs for Human Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[12]  Martial Hebert,et al.  Motion Words for Videos , 2014, ECCV.

[13]  Pinar Duygulu Sahin,et al.  Recognizing Human Actions Using Key Poses , 2010, 2010 20th International Conference on Pattern Recognition.

[14]  Cordelia Schmid,et al.  A Robust and Efficient Video Representation for Action Recognition , 2015, International Journal of Computer Vision.

[15]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[16]  Lei Wang,et al.  Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors , 2014, NIPS.

[17]  François Brémond,et al.  Contextual Statistics of Space-Time Ordered Features for Human Action Recognition , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[18]  Theo Gevers,et al.  Learning-based encoding with soft assignment for age estimation under unconstrained imaging conditions , 2012, Image Vis. Comput..

[19]  Patrick Bouthemy,et al.  Better Exploiting Motion for Better Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Larry S. Davis,et al.  Representing Videos Using Mid-level Discriminative Patches , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  乔宇 Motionlets: Mid-Level 3D Parts for Human Motion Recognition , 2013 .

[22]  Yoshihiko Nakamura,et al.  Action recognition from only somatosensory information using spectral learning in a hidden Markov model , 2016, Robotics Auton. Syst..

[23]  Limin Wang,et al.  Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice , 2014, Comput. Vis. Image Underst..

[24]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Christian Bauckhage,et al.  Action recognition by learning discriminative key poses , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[26]  Nasser Kehtarnavaz,et al.  Real-time human action recognition based on depth motion maps , 2016, Journal of Real-Time Image Processing.

[27]  Christian Thurau,et al.  Behavior Histograms for Action Recognition and Human Detection , 2007, Workshop on Human Motion.

[28]  Georgios Evangelidis,et al.  Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[29]  Daphna Weinshall,et al.  LDA Topic Model with Soft Assignment of Descriptors to Words , 2013, ICML.

[30]  Edmond Boyer,et al.  Action recognition using exemplar-based embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Larry S. Davis,et al.  Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Ajmal Mian,et al.  Learning a Deep Model for Human Action Recognition from Novel Viewpoints , 2016 .