论文信息 - Action recognition using ensemble weighted multi-instance learning

Action recognition using ensemble weighted multi-instance learning

This paper deals with recognizing human actions in depth video data. Current state-of-the-art action recognition methods use hand-designed features, which are difficult to produce and time-consuming to extend to new modalities. In this paper, we propose a novel, 3.5D representation of a depth video for action recognition. A 3.5D graph of the depth video consists of a set of nodes that are the joints of the human body. Each joint is represented by a set of spatio-temporal features, which are computed by an unsupervised learning approach. However, if occlusions occur, the 3D positions of the joints are noisy which increases the intra-class variations in action classes. To address this problem, we propose the Ensemble Weighted Multi-Instance Learning approach (EnwMi) for the action recognition task. It considers the class imbalance and intra-class variations. We formulate the action recognition task with depth videos as a weighted multi-instance problem. We further integrate an ensemble learning method into the weighted multi-instance learning framework. Our approach is evaluated on Microsoft Research Action3D dataset, and the results show that it outperforms state-of-the-art methods.

Alois Knoll | Daniel Clarke | Guang Chen | Manuel Giuliani | Andre Gaschler

[1] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[2] Qi Tian,et al. Human Daily Action Analysis with Multi-view and Color-Depth Data , 2012, ECCV Workshops.

[3] Ivan Laptev,et al. On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[4] Quoc V. Le,et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[5] S. V. N. Vishwanathan,et al. Multiple Kernel Learning and the SMO Algorithm , 2010, NIPS.

[6] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7] Ramakant Nevatia,et al. Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[8] Wanqing Li,et al. Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[9] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[10] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11] Jake K. Aggarwal,et al. Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Ying Wu,et al. Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[13] Zhi-Hua Zhou,et al. Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14] Wei Liang,et al. Discriminative human action recognition in the learned hierarchical manifold space , 2010, Image Vis. Comput..

[15] A. Parker,et al. Stereoscopic Vision in the Absence of the Lateral Occipital Cortex , 2010, PloS one.

[16] Zhi-Hua Zhou,et al. Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[17] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[18] Nikunj C. Oza,et al. Online Ensemble Learning , 2000, AAAI/IAAI.

[19] Ying Wu,et al. Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Zicheng Liu,et al. HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Aapo Hyvärinen,et al. Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[22] Lu Yang,et al. Combing RGB and Depth Map Features for human activity recognition , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[23] Meinard Müller,et al. Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.