Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors

In this paper, we present a novel approach for human action and gesture recognition using dual-complementary tensors. In particular, the proposed method constructs a compact and yet discriminative representation by normalizing the input video volume into dual tensors. One tensor is obtained from the raw video volume data and the other one is obtained from the histogram of oriented gradients (HOG) features. Each tensor is converted to factored matrices and the similarity between factored matrices is evaluated using canonical correlation analysis (CCA). We, furthermore, propose an information fusion method to combine the resulting similarity scores. The proposed fusion strategy can effectively enhance discriminability between different action categories and lead to better recognition accuracy. We have conducted several experiments on two publicly available databases (UCF sports and Cambridge-Gesture). The results show that our proposed method achieves comparable recognition accuracy as the state-of-the-art methods.

[1]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[2]  Luc Van Gool,et al.  Latent Dictionary Learning for Sparse Representation Based Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Tinne Tuytelaars,et al.  Modeling video evolution for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Bruce A. Draper,et al.  Scalable action recognition with a subspace forest , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Indriyati Atmosukarto,et al.  Action Recognition Using Discriminative Structured Trajectory Groups , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[6]  Radha Poovendran,et al.  Group Event Detection With a Varying Number of Group Members for Video Surveillance , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Feng Shi,et al.  Sampling Strategies for Real-Time Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Tae-Kyun Kim,et al.  Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Ling Shao,et al.  Spatio-Temporal Laplacian Pyramid Coding for Action Recognition , 2014, IEEE Transactions on Cybernetics.

[11]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[12]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Yui Man Lui,et al.  Tangent Bundles on Special Manifolds for Action Recognition , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  J. Ross Beveridge,et al.  Tangent bundle for human action recognition , 2011, Face and Gesture 2011.

[15]  Günther Palm,et al.  A generic framework for the inference of user states in human computer interaction , 2012, Journal on Multimodal User Interfaces.

[16]  Yongdong Zhang,et al.  Automatic Detection and Analysis of Player Action in Moving Background Sports Video Sequences , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Chunfeng Yuan,et al.  Multi-task Sparse Learning with Beta Process Prior for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Dong Xu,et al.  Action Recognition Using Multilevel Features and Latent Structural SVM , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Alessio Del Bue,et al.  Human behavior analysis in video surveillance: A Social Signal Processing perspective , 2013, Neurocomputing.

[22]  Brian C. Lovell,et al.  Kernel analysis on Grassmann manifolds for action recognition , 2013, Pattern Recognit. Lett..

[23]  IEEE conference on computer vision and pattern recognition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[24]  Jean Meunier,et al.  Robust Video Surveillance for Fall Detection Based on Human Shape Deformation , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Yuting Su,et al.  Multiple/Single-View Human Action Recognition via Part-Induced Multitask Structural Learning , 2015, IEEE Transactions on Cybernetics.

[26]  Serge J. Belongie,et al.  Integral Channel Features - Addendum , 2009 .

[27]  Yale Song,et al.  Continuous body and hand gesture recognition for natural human-computer interaction , 2012, TIIS.

[28]  Tanaya Guha,et al.  Learning Sparse Representations for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[30]  Arun Ross,et al.  Information fusion in biometrics , 2003, Pattern Recognit. Lett..

[31]  Iasonas Kokkinos,et al.  Discovering discriminative action parts from mid-level video representations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Sukhendu Das,et al.  MST-CSS (Multi-Spectro-Temporal Curvature Scale Space), a Novel Spatio-Temporal Representation for Content-Based Video Retrieval , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Janusz Konrad,et al.  A gesture-driven computer interface using Kinect , 2012, 2012 IEEE Southwest Symposium on Image Analysis and Interpretation.

[34]  Gattigorla Nagendar,et al.  Action Recognition Using Canonical Correlation Kernels , 2012, ACCV.

[35]  Xiao Liu,et al.  LF-EME: Local features with elastic manifold embedding for human action recognition , 2013, Neurocomputing.

[36]  No Value,et al.  IEEE International Conference on Image Processing , 2003 .

[37]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[38]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[39]  Sheng Tang,et al.  Localized Multiple Kernel Learning for Realistic Human Action Recognition in Videos , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[40]  Ling Shao,et al.  Relevance feedback for real-world human action retrieval , 2012, Pattern Recognit. Lett..

[41]  Jian-Huang Lai,et al.  Supervised Spatio-Temporal Neighborhood Topology Learning for Action Recognition , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  Ling Shao,et al.  Embedding Motion and Structure Features for Action Recognition , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[43]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[44]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[45]  Luca Benini,et al.  Gesture Recognition in Ego-centric Videos Using Dense Trajectories and Hand Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[46]  Yui Man Lui,et al.  Human gesture recognition on product manifolds , 2012, J. Mach. Learn. Res..

[47]  Ling Shao,et al.  Content-based retrieval of human actions from realistic video databases , 2013, Inf. Sci..

[48]  J. Ross Beveridge,et al.  Action classification on product manifolds , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[50]  Q. M. Jonathan Wu,et al.  Incremental Learning in Human Action Recognition Based on Snippets , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[51]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Alexandros Iosifidis,et al.  Minimum Class Variance Extreme Learning Machine for Human Action Recognition , 2013, IEEE Transactions on Circuits and Systems for Video Technology.