Online robust action recognition based on a hierarchical model

With the strong demand for human machine interaction, action recognition has attracted more and more attention in recent years. Traditional video-based approaches are very sensitive to background activity, and also lack the ability to discriminate complex 3D motion. With the emergence and development of commercial depth cameras, action recognition based on 3D skeleton joints is becoming more and more popular. However, a skeleton-based approach is still very challenging because of the large variation in human actions and temporal dynamics. In this paper, we propose a hierarchical model for action recognition. To handle confusing motions in a large feature space, a motion-based grouping method is first proposed, which can efficiently assign each video a group label, and then for each group, a pre-trained classifier is used for frame-labeling. Unlike previous methods, we adopt a bottom-up approach that first performs action recognition for each frame. The final action label is obtained by fusing the classification to its frames, with the effect of each frame being adaptively adjusted based on its local properties. The proposed method is evaluated using two challenge datasets captured by a Kinect. Experiments show that our method can perform more robustly than state-of-the-art approaches.

[1]  Sotirios Chatzis,et al.  A conditional random field-based model for joint sequence segmentation and classification , 2013, Pattern Recognit..

[2]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[3]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[5]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[6]  Aytül Erçil,et al.  A Decision Forest Based Feature Selection Framework for Action Recognition from RGB-Depth Cameras , 2013, ICIAR.

[7]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[8]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[9]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[11]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[12]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Martial Hebert,et al.  Trajectons: Action recognition through the motion analysis of tracked features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.