论文信息 - Subject independent human action recognition using spatio-depth information and meta-cognitive RBF network

Subject independent human action recognition using spatio-depth information and meta-cognitive RBF network

In this paper, we present a machine learning approach for subject independent human action recognition using depth camera, emphasizing the importance of depth in recognition of actions. The proposed approach uses the flow information of all 3 dimensions to classify an action. In our approach, we have obtained the 2-D optical flow and used it along with the depth image to obtain the depth flow (Z motion vectors). The obtained flow captures the dynamics of the actions in space-time. Feature vectors are obtained by averaging the 3-D motion over a grid laid over the silhouette in a hierarchical fashion. These hierarchical fine to coarse windows capture the motion dynamics of the object at various scales. The extracted features are used to train a Meta-cognitive Radial Basis Function Network (McRBFN) that uses a Projection Based Learning (PBL) algorithm, referred to as PBL-McRBFN, henceforth. PBL-McRBFN begins with zero hidden neurons and builds the network based on the best human learning strategy, namely, self-regulated learning in a meta-cognitive environment. When a sample is used for learning, PBL-McRBFN uses the sample overlapping conditions, and a projection based learning algorithm to estimate the parameters of the network. The performance of PBL-McRBFN is compared to that of a Support Vector Machine (SVM) and Extreme Learning Machine (ELM) classifiers with representation of every person and action in the training and testing datasets. Performance study shows that PBL-McRBFN outperforms these classifiers in recognizing actions in 3-D. Further, a subject-independent study is conducted by leave-one-subject-out strategy and its generalization performance is tested. It is observed from the subject-independent study that McRBFN is capable of generalizing actions accurately. The performance of the proposed approach is benchmarked with Video Analytics Lab (VAL) dataset and Berkeley Multi-modal Human Action Database (MHAD).

[1] T. O. Nelson. Metacognition : core readings , 1992 .

[2] Yiannis Aloimonos,et al. View-Invariant Modeling and Recognition of Human Actions Using Grammars , 2006, WDV.

[3] Mohak Shah,et al. Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[4] Rémi Ronfard,et al. Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[5] W. P. Rivers. Autonomy at All Costs: An Ethnography of Metacognitive Self-Assessment and Self-Management among Experienced Language Learners. , 2001 .

[6] Sundaram Suresh,et al. A projection based learning in Meta-cognitive Radial Basis Function Network for classification problems , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[7] Junji Yamato,et al. Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8] Rémi Ronfard,et al. A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[9] Ronald Poppe,et al. A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[10] Ronen Basri,et al. Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] R. Venkatesh Babu,et al. Fully complex-valued ELM classifiers for human action recognition , 2011, The 2011 International Joint Conference on Neural Networks.

[12] A. Wenden. Metacognitive knowledge and language learning , 1998 .

[13] Sundaram Suresh,et al. Lift coefficient prediction at high angle of attack using recurrent neural network , 2003 .

[14] H. Mizoguchi,et al. Pedestrian detection using 3D optical flow sequences for a mobile robot , 2008, 2008 IEEE Sensors.

[15] Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[16] James W. Davis,et al. The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[17] Tao Xiong,et al. A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[18] T. O. Nelson. Metamemory: A Theoretical Framework and New Findings , 1990 .

[19] Mubarak Shah,et al. A differential geometric approach to representing the human actions , 2008, Comput. Vis. Image Underst..

[20] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[21] E. Chong,et al. Introduction to optimization , 1987 .

[22] R. Venkatesh Babu,et al. Compressed domain action classification using HMM , 2002, Pattern Recognit. Lett..

[23] P. Fihl,et al. View-invariant gesture recognition using 3D optical flow and harmonic motion context , 2010, Comput. Vis. Image Underst..

[24] Mubarak Shah,et al. Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Jitendra Malik,et al. Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26] Narasimhan Sundararajan,et al. Risk-sensitive loss functions for sparse multi-category classification problems , 2008, Inf. Sci..

[27] Ruzena Bajcsy,et al. Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[28] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[29] Sundaram Suresh,et al. Meta-cognitive RBF Network and its Projection Based Learning algorithm for classification problems , 2013, Appl. Soft Comput..

[30] Randy M. Isaacson,et al. Metacognitive Knowledge Monitoring and Self-Regulated Learning: Academic Success and Reflections on Learning. , 2006 .