Action recognition based on hierarchical dynamic Bayesian network

In this paper, a novel action recognition method is proposed based on hierarchical dynamic Bayesian network (HDBN). The algorithm is divided into system learning stage and action recognition stage. In the stage of system learning, the video features are extracted using deep neural networks firstly, and using hierarchical clustering and assisting manually, a hierarchical action semantic dictionary (HASD) is built. The next, we construct the HDBN graph model to present video sequence. In the stage of recognition, we first get the representative frames of unknown video using deep neural networks. The features are inputted into the HDBN, and the HDBN inference is used to get recognition results. The testing results show the proposed method is promising.

[1]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[2]  Chalavadi Krishna Mohan,et al.  Human action recognition using genetic algorithms and convolutional neural networks , 2016, Pattern Recognit..

[3]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Kai Liu,et al.  Profile HMMs for skeleton-based human action recognition , 2016, Signal Process. Image Commun..

[5]  Thierry Artières,et al.  Neural conditional random fields , 2010, AISTATS.

[6]  Alexandros André Chaaraoui,et al.  Silhouette-based human action recognition using sequences of key poses , 2013, Pattern Recognit. Lett..

[7]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Yang Yang,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, CVPR.

[9]  J. Gross,et al.  Graph Theory and Its Applications , 1998 .

[10]  Xiaoping Wang,et al.  Key Frame Extraction based on MPEG Compression Domain , 2012 .

[11]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[12]  Liang-Tien Chia,et al.  Motion Context: A New Representation for Human Action Recognition , 2008, ECCV.

[13]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[15]  Yong Pei,et al.  Integrating multi-stage depth-induced contextual information for human action recognition and localization , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[16]  Wenzhi Chen,et al.  Evaluation of semi-supervised learning method on action recognition , 2014, Multimedia Tools and Applications.

[17]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[19]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[20]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[21]  Jessica K. Hodgins,et al.  Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Yang Yi,et al.  Human action recognition with graph-based multiple-instance learning , 2016, Pattern Recognit..

[23]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[24]  Qinghua Hu,et al.  Convolutional neural random fields for action recognition , 2016, Pattern Recognit..

[25]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Martial Hebert,et al.  Dense Optical Flow Prediction from a Static Image , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Honglak Lee,et al.  Unsupervised learning of hierarchical representations with convolutional deep belief networks , 2011, Commun. ACM.

[28]  Carme Torras,et al.  Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain , 2016, IEEE Robotics and Automation Letters.

[29]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[30]  Zhaojie Ju,et al.  Multi-view transition HMMs based view-invariant human action recognition method , 2015, Multimedia Tools and Applications.

[31]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[33]  Guijin Wang,et al.  A novel hierarchical framework for human action recognition , 2016, Pattern Recognit..

[34]  Yu Kong,et al.  Learning hierarchical 3D kernel descriptors for RGB-D action recognition , 2016, Comput. Vis. Image Underst..

[35]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).