Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection

This paper addresses a spatiotemporal pattern recognition problem. The main purpose of this study is to find a right representation and matching of action video volumes for categorization. A novel method is proposed to measure video-to-video volume similarity by extending Canonical Correlation Analysis (CCA), a principled tool to inspect linear relations between two sets of vectors, to that of two multiway data arrays (or tensors). The proposed method analyzes video volumes as inputs avoiding the difficult problem of explicit motion estimation required in traditional methods and provides a way of spatiotemporal pattern matching that is robust to intraclass variations of actions. The proposed matching is demonstrated for action classification by a simple Nearest Neighbor classifier. We, moreover, propose an automatic action detection method, which performs 3D window search over an input video with action exemplars. The search is speeded up by dynamic learning of subspaces in the proposed CCA. Experiments on a public action data set (KTH) and a self-recorded hand gesture data showed that the proposed method is significantly better than various state-of-the-art methods with respect to accuracy. Our method has low time complexity and does not require any major tuning parameters.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  Tae-Kyun Kim,et al.  Learning Motion Categories using both Semantic and Structural Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[4]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Tae-Kyun Kim,et al.  Gesture Recognition Under Small Sample Size , 2007, ACCV.

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[9]  Gene H. Golub,et al.  Numerical methods for computing angles between linear subspaces , 1971, Milestones in Matrix Computation.

[10]  Michael J. Black Explaining optical flow events with parameterized spatio-temporal models , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[11]  Narendra Ahuja,et al.  Rank-R approximation of tensors using image-as-matrix representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Lior Wolf,et al.  Modeling Appearances with Low-Rank SVM , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Danijel Skocaj,et al.  Weighted and robust incremental method for subspace learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[16]  M. Alex O. Vasilescu Human motion signatures: analysis, synthesis, recognition , 2002, Object recognition supported by user interaction for service robots.

[17]  Roberto Cipolla,et al.  Real-time Interpretation of Hand Motions using a Sparse Bayesian Classifier on Motion Gradient Orientation Images , 2005, BMVC.

[18]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[19]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[21]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22]  Rama Chellappa,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Matching Shape Sequences in Video with Applications in Human Movement Analysis. Ieee Transactions on Pattern Analysis and Machine Intelligence 2 , 2022 .

[23]  David A. Forsyth,et al.  Automatic Annotation of Everyday Movements , 2003, NIPS.

[24]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[26]  Stefano Soatto,et al.  Recognition of human gaits , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[27]  Lior Wolf,et al.  Kernel principal angles for classification machines with applications to image sequence interpretation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Ralph R. Martin,et al.  Merging and Splitting Eigenspace Models , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Rama Chellappa,et al.  "Shape Activity": a continuous-state HMM for moving/deforming shapes with application to abnormal activity detection , 2005, IEEE Transactions on Image Processing.

[31]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Dong Xu,et al.  Discriminant analysis with tensor representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[35]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[36]  Ken-ichi Maeda,et al.  Face recognition using temporal image sequence , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[37]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[38]  John K. Tsotsos,et al.  Applying Ensembles of Multilinear Classifiers in the Frequency Domain , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  F BobickAaron,et al.  The Recognition of Human Movement Using Temporal Templates , 2001 .

[40]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Mubarak Shah,et al.  Matching actions in presence of camera motion , 2006, Comput. Vis. Image Underst..