Randomized time warping for motion recognition

Abstract Dynamic time warping (DTW) has been widely used for the alignment and comparison of two sequential patterns. In DTW, dynamic programming is used to avoid an exhaustive search for the alignment. In this paper, we propose a randomized extension of the DTW concept, termed randomized time warping (RTW), for motion recognition. RTW generates time elastic (TE) features by randomly sampling the sequential data while retaining the temporal information. A set of TE features is represented by a low-dimensional subspace, called the sequence hypothesis (Hypo) subspace, and the similarity between two sequential patterns is defined by the canonical angles between the two corresponding Hypo subspaces. In essence, RTW simultaneously computes multiple degrees of similarities between a number of warped patterns' pair candidates, while in practice, RTW generalizes the Hankel matrix commonly used in modeling of system dynamics. We demonstrate the applicability of RTW through experiments on gesture recognition using three public datasets, namely, the Cambridge gesture database, a subset of the one-shot-learning dataset from the ChaLearn Gesture Challenge, and the KTH action dataset.

[1]  Vladimir Pavlovic,et al.  Isotonic CCA for sequence alignment and activity recognition , 2011, 2011 International Conference on Computer Vision.

[2]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[3]  Giorgio Picci,et al.  Realization of stochastic systems with exogenous inputs and subspace identification methods , 1999, Autom..

[4]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jiawei Han,et al.  Speed up kernel discriminant analysis , 2011, The VLDB Journal.

[6]  J. Ross Beveridge,et al.  Action classification on product manifolds , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Tae-Kyun Kim,et al.  Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Osamu Yamaguchi,et al.  Face Recognition Using Multi-viewpoint Patterns for Robot Vision , 2003, ISRR.

[9]  Binlong Li,et al.  Activity recognition using dynamic subspace angles , 2011, CVPR 2011.

[10]  Brian C. Lovell,et al.  Kernel analysis on Grassmann manifolds for action recognition , 2013, Pattern Recognit. Lett..

[11]  Alex Pentland,et al.  Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[12]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[13]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[14]  Yoshinobu Kawahara,et al.  A Kernel Subspace Method by Stochastic Realization for Learning Nonlinear Dynamical Systems , 2006, NIPS.

[15]  Atsuto Maki,et al.  Difference Subspace and Its Generalization for Subspace-Based Methods , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Fernando De la Torre,et al.  Canonical Time Warping for Alignment of Human Behavior , 2009, NIPS.

[17]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[19]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[20]  Gunnar Rätsch,et al.  Constructing Descriptive and Discriminative Nonlinear Features: Rayleigh Coefficients in Kernel Feature Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Seiichi Nakagawa,et al.  Speaker-Independent English Consonant and Japanese Word Recognition by a Stochastic Dynamic Time Warping Method , 1988 .

[22]  Yang Wang,et al.  Semi-Latent Dirichlet Allocation: A Hierarchical Model for Human Action Recognition , 2007, Workshop on Human Motion.

[23]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[24]  Larry S. Davis,et al.  Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Giorgio Metta,et al.  Keep it simple and sparse: real-time action recognition , 2013, J. Mach. Learn. Res..

[26]  Yui Man Lui,et al.  Human gesture recognition on product manifolds , 2012, J. Mach. Learn. Res..

[27]  P. Matsakis,et al.  The use of force histograms for affine-invariant relative position description , 2004 .

[28]  Marcel J. T. Reinders,et al.  Sign Language Recognition by Combining Statistical DTW and Independent Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Licheng Jiao,et al.  Manifold-constrained coding and sparse representation for human action recognition , 2013, Pattern Recognit..

[30]  Fernando De la Torre,et al.  Generalized time warping for multi-modal alignment of human motion , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Marta Mejail,et al.  Transfer Learning Decision Forests for Gesture Recognition , 2017, Gesture Recognition.

[32]  Kazuhiro Fukui,et al.  Hand-Shape Recognition Using the Distributions of Multi-Viewpoint Image Sets , 2012, IEICE Trans. Inf. Syst..

[33]  Ramin Zabih,et al.  Dynamic Programming and Graph Algorithms in Computer Vision , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Claus Bahlmann,et al.  The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.