A Tensor-Driven Temporal Correlation Model for Video Sequence Classification

The task of video sequence classification plays a critical role in the development of computer vision. Considering this fact, this letter proposes a novel tensor decomposition method called tensor-driven temporal correlation in which general tensors are used as input for video sequence classification. Because distortion and redundancy may exist in the tensor representations of video sequences, we project the original tensor into subspaces spanned by spatial basis matrices in the proposed formulation. Moreover, to better preserve the temporal smoothness between consecutive slices of the tensor, the basis matrices are jointly learned by introducing an autoregressive model. An experiment on the commonly used Cambridge hand-gesture database demonstrates that our proposed method reaches convergence within a small number of iterations during the training stage and achieves promising results compared with state-of-the-art methods.

[1]  Rong Yan,et al.  Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.

[2]  Fabien Cardinaux,et al.  Video based technology for ambient assisted living: A review of the literature , 2011, J. Ambient Intell. Smart Environ..

[3]  Jian Yang,et al.  Sparse tensor discriminant analysis , 2013, IEEE Transactions on Image Processing.

[4]  Brian C. Lovell,et al.  Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching , 2011, CVPR 2011.

[5]  Hakan Cevikalp,et al.  Face recognition based on image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Yong Wang,et al.  Tensor Discriminant Analysis for View-based Object Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[7]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Yan Song,et al.  Describing Trajectory of Surface Patch for Human Action Recognition on RGB and Depth Videos , 2015, IEEE Signal Processing Letters.

[9]  Brian C. Lovell,et al.  Kernel analysis over Riemannian manifolds for visual recognition of actions, pedestrians and textures , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[10]  Zhouyu Fu,et al.  Semantic-Based Surveillance Video Retrieval , 2007, IEEE Transactions on Image Processing.

[11]  Yui Man Lui,et al.  Tangent Bundles on Special Manifolds for Action Recognition , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Tae-Kyun Kim,et al.  Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Xi Chen,et al.  Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization , 2010, SDM.

[14]  Hanghang Tong,et al.  Facets: Fast Comprehensive Mining of Coevolving High-order Time Series , 2015, KDD.

[15]  J. Ross Beveridge,et al.  Action classification on product manifolds , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Lei Li,et al.  Multilinear Dynamical Systems for Tensor Time Series , 2013, NIPS.

[17]  Guodong Guo,et al.  A Study on Visible to Infrared Action Recognition , 2013, IEEE Signal Processing Letters.

[18]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[19]  Yuting Su,et al.  A spatial-temporal iterative tensor decomposition technique for action and gesture recognition , 2017, Multimedia Tools and Applications.

[20]  Changsheng Xu,et al.  Mining Semantic Context Information for Intelligent Video Surveillance of Traffic Scenes , 2013, IEEE Transactions on Industrial Informatics.