Subject independent human action recognition using spatio-depth information and meta-cognitive RBF network

In this paper, we present a machine learning approach for subject independent human action recognition using depth camera, emphasizing the importance of depth in recognition of actions. The proposed approach uses the flow information of all 3 dimensions to classify an action. In our approach, we have obtained the 2-D optical flow and used it along with the depth image to obtain the depth flow (Z motion vectors). The obtained flow captures the dynamics of the actions in space-time. Feature vectors are obtained by averaging the 3-D motion over a grid laid over the silhouette in a hierarchical fashion. These hierarchical fine to coarse windows capture the motion dynamics of the object at various scales. The extracted features are used to train a Meta-cognitive Radial Basis Function Network (McRBFN) that uses a Projection Based Learning (PBL) algorithm, referred to as PBL-McRBFN, henceforth. PBL-McRBFN begins with zero hidden neurons and builds the network based on the best human learning strategy, namely, self-regulated learning in a meta-cognitive environment. When a sample is used for learning, PBL-McRBFN uses the sample overlapping conditions, and a projection based learning algorithm to estimate the parameters of the network. The performance of PBL-McRBFN is compared to that of a Support Vector Machine (SVM) and Extreme Learning Machine (ELM) classifiers with representation of every person and action in the training and testing datasets. Performance study shows that PBL-McRBFN outperforms these classifiers in recognizing actions in 3-D. Further, a subject-independent study is conducted by leave-one-subject-out strategy and its generalization performance is tested. It is observed from the subject-independent study that McRBFN is capable of generalizing actions accurately. The performance of the proposed approach is benchmarked with Video Analytics Lab (VAL) dataset and Berkeley Multi-modal Human Action Database (MHAD).

[1]  T. O. Nelson Metacognition : core readings , 1992 .

[2]  Yiannis Aloimonos,et al.  View-Invariant Modeling and Recognition of Human Actions Using Grammars , 2006, WDV.

[3]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[4]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[5]  W. P. Rivers Autonomy at All Costs: An Ethnography of Metacognitive Self-Assessment and Self-Management among Experienced Language Learners. , 2001 .

[6]  Sundaram Suresh,et al.  A projection based learning in Meta-cognitive Radial Basis Function Network for classification problems , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[7]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[9]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[10]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  R. Venkatesh Babu,et al.  Fully complex-valued ELM classifiers for human action recognition , 2011, The 2011 International Joint Conference on Neural Networks.

[12]  A. Wenden Metacognitive knowledge and language learning , 1998 .

[13]  Sundaram Suresh,et al.  Lift coefficient prediction at high angle of attack using recurrent neural network , 2003 .

[14]  H. Mizoguchi,et al.  Pedestrian detection using 3D optical flow sequences for a mobile robot , 2008, 2008 IEEE Sensors.

[15]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[16]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[18]  T. O. Nelson Metamemory: A Theoretical Framework and New Findings , 1990 .

[19]  Mubarak Shah,et al.  A differential geometric approach to representing the human actions , 2008, Comput. Vis. Image Underst..

[20]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[21]  E. Chong,et al.  Introduction to optimization , 1987 .

[22]  R. Venkatesh Babu,et al.  Compressed domain action classification using HMM , 2002, Pattern Recognit. Lett..

[23]  P. Fihl,et al.  View-invariant gesture recognition using 3D optical flow and harmonic motion context , 2010, Comput. Vis. Image Underst..

[24]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Narasimhan Sundararajan,et al.  Risk-sensitive loss functions for sparse multi-category classification problems , 2008, Inf. Sci..

[27]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[28]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[29]  Sundaram Suresh,et al.  Meta-cognitive RBF Network and its Projection Based Learning algorithm for classification problems , 2013, Appl. Soft Comput..

[30]  Randy M. Isaacson,et al.  Metacognitive Knowledge Monitoring and Self-Regulated Learning: Academic Success and Reflections on Learning. , 2006 .