Action recognition using rank-1 approximation of Joint Self-Similarity Volume

In this paper, we make three main contributions in the area of action recognition: (i) We introduce the concept of Joint Self-Similarity Volume (Joint SSV) for modeling dynamical systems, and show that by using a new optimized rank-1 tensor approximation of Joint SSV one can obtain compact low-dimensional descriptors that very accurately preserve the dynamics of the original system, e.g. an action video sequence; (ii) The descriptor vectors derived from the optimized rank-1 approximation make it possible to recognize actions without explicitly aligning the action sequences of varying speed of execution or different frame rates; (iii) The method is generic and can be applied using different low-level features such as silhouettes, histogram of oriented gradients, etc. Hence, it does not necessarily require explicit tracking of features in the space-time volume. Our experimental results on three public datasets demonstrate that our method produces remarkably good results and outperforms all baseline methods.

[1]  J. Leeuw,et al.  Principal component analysis of three-mode data by means of alternating least squares algorithms , 1980 .

[2]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Béla Ágai,et al.  CONDENSED 1,3,5-TRIAZEPINES - V THE SYNTHESIS OF PYRAZOLO [1,5-a] [1,3,5]-BENZOTRIAZEPINES , 1983 .

[4]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Pieter M. Kroonenberg,et al.  Three-mode principal component analysis : theory and applications , 1983 .

[7]  Cordelia Schmid,et al.  Will person detection help bag-of-features action recognition? , 2010 .

[8]  L. Lathauwer,et al.  On the Best Rank-1 and Rank-( , 2004 .

[9]  Liang-Tien Chia,et al.  Motion Context: A New Representation for Human Action Recognition , 2008, ECCV.

[10]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Pascal Fua,et al.  Making Action Recognition Robust to Occlusions and Viewpoint Changes , 2010, ECCV.

[12]  Gene H. Golub,et al.  Rank-One Approximation to High Order Tensors , 2001, SIAM J. Matrix Anal. Appl..

[13]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[14]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[15]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[16]  Franziska Meier,et al.  3D Shape Context and Distance Transform for action recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[17]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[18]  Hassan Foroosh,et al.  View-Invariant Action Recognition from Point Triplets , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Larry S. Davis,et al.  Gait Recognition Using Image Self-Similarity , 2004, EURASIP J. Adv. Signal Process..

[21]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[24]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Andrew Gilbert,et al.  Fast realistic multi-action recognition using mined dense spatio-temporal features , 2009, 2009 IEEE 12th International Conference on Computer Vision.