Spatial-Temporal correlatons for unsupervised action classification

Spatial-temporal local motion features have shown promising results in complex human action classification. Most of the previous works [6],[16],[21] treat these spatial- temporal features as a bag of video words, omitting any long range, global information in either the spatial or temporal domain. Other ways of learning temporal signature of motion tend to impose a fixed trajectory of the features or parts of human body returned by tracking algorithms. This leaves little flexibility for the algorithm to learn the optimal temporal pattern describing these motions. In this paper, we propose the usage of spatial-temporal correlograms to encode flexible long range temporal information into the spatial-temporal motion features. This results into a much richer description of human actions. We then apply an unsupervised generative model to learn different classes of human actions from these ST-correlograms. KTH dataset, one of the most challenging and popular human action dataset, is used for experimental evaluation. Our algorithm achieves the highest classification accuracy reported for this dataset under an unsupervised learning scheme.

[1]  P. Dodwell Visual Pattern Recognition , 1970 .

[2]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  M. Tarr Visual Pattern Recognition , 1998 .

[4]  Yang Song,et al.  Unsupervised Learning of Human Motion , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[9]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Pietro Perona,et al.  Hybrid models for human motion recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[16]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Silvio Savarese,et al.  Discriminative Object Class Models of Appearance and Shape by Correlatons , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[19]  Ze-Nian Li,et al.  Successive Convex Matching for Action Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Martial Hebert,et al.  Spatio-temporal Shape and Flow Correlation for Action Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Tae-Kyun Kim,et al.  Learning Motion Categories using both Semantic and Structural Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[24]  M.,et al.  Statistical and Structural Approaches to Texture , 2022 .