A spatial-temporal iterative tensor decomposition technique for action and gesture recognition

Classification of video sequences is an important task with many applications in video search and action recognition. As opposed to some traditional approaches that transform original video sequences into forms of visual feature vectors, tensor-based methods have been proposed for classifying video sequences with natural representation of original data. However, one obvious limitation of tensor-based methods is that the input video sequences are often required to be preprocessed with a unified length of time. In this paper, we propose a technique for handling classification of video sequences in unequal length of time, namely Spatial-Temporal Iterative Tensor Decomposition (S-TITD) for uniform length. The proposed framework contains two primary steps. We first represent original video sequences as a third-order tensor and perform Tucker-2 decomposition to obtain the reduced-dimension core tensor. Then we encode the third order of core tensor to a uniform length by adaptively selecting the most informative slices. Notably, the above two steps are embedded into a dynamic learning framework to guarantee the proposed method has the ability of updating results over time. We conduct a series of experiments on three public datasets in gesture and action recognition, and the experimental results show that the proposed S-TITD approach achieves better performances than the state-of-the-art algorithms.

[1]  Sébastien Marcel,et al.  Hand gesture recognition using input-output hidden Markov models , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[2]  Rong Yan,et al.  Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.

[3]  Yi Yang,et al.  Beyond Doctors: Future Health Prediction from Multimedia and Multimodal Observations , 2015, ACM Multimedia.

[4]  Brian C. Lovell,et al.  Kernel analysis over Riemannian manifolds for visual recognition of actions, pedestrians and textures , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[5]  F. Florez,et al.  Hand gesture recognition following the dynamics of a topology-preserving network , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[6]  Yanan Liu,et al.  Multi-modality video shot clustering with tensor representation , 2008, Multimedia Tools and Applications.

[7]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[8]  Nobuyuki Otsu,et al.  Gesture recognition using auto-regressive coefficients of higher-order local auto-correlation features , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[9]  J. Ross Beveridge,et al.  Action classification on product manifolds , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Dan Schonfeld,et al.  Dynamic Proposal Variance and Optimal Particle Allocation in Particle Filtering for Video Tracking , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Feiping Nie,et al.  Extracting the optimal dimensionality for local tensor discriminant analysis , 2009, Pattern Recognit..

[12]  Yui Man Lui,et al.  Tangent Bundles on Special Manifolds for Action Recognition , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Xiaoqin Zhang,et al.  Visual tracking via dynamic tensor analysis with mean update , 2011, Neurocomputing.

[14]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Seong-Whan Lee,et al.  Recognizing hand gestures using dynamic Bayesian network , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[16]  Ramon Mas,et al.  Hand Tracking and Gesture Recognition for Human-Computer Interaction , 2005 .

[17]  Edward J. Delp,et al.  Efficient and Low-Complexity Surveillance Video Compression Using Backward-Channel Aware Wyner-Ziv Video Coding , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  M. Shamim Hossain,et al.  Cloud-Assisted Speech and Face Recognition Framework for Health Monitoring , 2015, Mobile Networks and Applications.

[19]  A. Cichocki,et al.  Tensor decompositions for feature extraction and classification of high dimensional datasets , 2010 .

[20]  Haiping Lu,et al.  MPCA: Multilinear Principal Component Analysis of Tensor Objects , 2008, IEEE Transactions on Neural Networks.

[21]  Pierfrancesco Bellini,et al.  Mobile Medicine: semantic computing management for health care applications on desktop and mobile devices , 2012, Multimedia Tools and Applications.

[22]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[23]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Brian C. Lovell,et al.  Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching , 2011, CVPR 2011.

[25]  Yue Gao,et al.  Probabilistic Skimlets Fusion for Summarizing Multiple Consumer Landmark Videos , 2015, IEEE Transactions on Multimedia.

[26]  Yi-Liang Zhao,et al.  Bridging the Vocabulary Gap between Health Seekers and Healthcare Knowledge , 2015, IEEE Transactions on Knowledge and Data Engineering.

[27]  Mubarak Shah,et al.  Recognizing Hand Gestures , 1994, ECCV.

[28]  Tianqi Yang,et al.  Multi-gait identification based on multilinear analysis and multi-target tracking , 2015, Multimedia Tools and Applications.

[29]  Hakan Cevikalp,et al.  Face recognition based on image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Luming Zhang,et al.  An Effective Video Summarization Framework Toward Handheld Devices , 2015, IEEE Transactions on Industrial Electronics.

[31]  Tae-Kyun Kim,et al.  Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Yi Yang,et al.  Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[33]  Jian Yang,et al.  Sparse tensor discriminant analysis , 2013, IEEE Transactions on Image Processing.

[34]  Xiaoou Tang,et al.  Tensor linear Laplacian discrimination (TLLD) for feature extraction , 2009, Pattern Recognit..

[35]  Thomas S. Huang,et al.  Gesture modeling and recognition using finite state machines , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[36]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[37]  Ji Tao,et al.  Quickest change detection for health-care video surveillance , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[38]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[40]  Yi Yang,et al.  Image Clustering Using Local Discriminant Models and Global Integration , 2010, IEEE Transactions on Image Processing.

[41]  Zhouyu Fu,et al.  Semantic-Based Surveillance Video Retrieval , 2007, IEEE Transactions on Image Processing.

[42]  Meng Wang,et al.  Disease Inference from Health-Related Questions via Sparse Deep Learning , 2015, IEEE Transactions on Knowledge and Data Engineering.

[43]  Todd Ingalls,et al.  Real-time Gesture Recognition with Minimal Training Requirements and On-line Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Tao Li,et al.  A Joint Local-Global Approach for Medical Terminology Assignment , 2014, MedIR@SIGIR.

[45]  Tao Li,et al.  WenZher: comprehensive vertical search for healthcare domain , 2014, SIGIR.