A Matrix-Based Approach to Unsupervised Human Action Categorization

Human action, as the basic unit of most human-relevant video content, bridges the gap between low-level visual features and high-level semantics. Human action recognition is of great significance in the applications of human-computer interaction, intelligent video surveillance, video retrieval and search. In this paper, we propose a novel unsupervised approach to mining categories from action video sequences, which consists of two modules: action representation for video data structurization and learning model for unsupervised categorization. In action representation, a novel view of video decomposition is presented. Videos are regarded as spatially distributed dynamic pixel time series, and these dynamic pixels are first quantized into pixel prototypes. After replacing the pixel time series with their corresponding prototype labels, the video sequences are compressed into two-dimensional action matrices. In the learning model, we put these matrices together to form an multi-action tensor, and propose the joint matrix factorization method to simultaneously cluster the pixel prototypes into pixel signatures, and matrices into action classes with the consideration of the duality between pixel clustering and action clustering. The approach is tested on public and popular Weizmann, and KTH datasets, and promising results are achieved.

[1]  Xuelong Li,et al.  Tensor Rank One Discriminant Analysis - A convergent method for discriminative multilinear subspace selection , 2008, Neurocomputing.

[2]  Jing Zhao,et al.  Document Clustering Based on Nonnegative Sparse Matrix Factorization , 2005, ICNC.

[3]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[4]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[5]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, ICCV.

[7]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Benjamin Z. Yao,et al.  Learning deformable action templates from cluttered videos , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Lior Wolf,et al.  Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Ramesh Nallapati,et al.  Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability , 2007 .

[11]  Xuelong Li,et al.  General Tensor Discriminant Analysis and Gabor Features for Gait Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Divyakant Agrawal,et al.  A comparison of DFT and DWT based similarity search in time-series databases , 2000, CIKM '00.

[13]  Chengcui Zhang,et al.  An Interactive Semantic Video Mining and Retrieval Platform--Application in Transportation Surveillance Video for Incident Detection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[14]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[15]  Shaogang Gong,et al.  Video behaviour profiling and abnormality detection without manual labelling , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  F. Mörchen Time series feature extraction for data mining using DWT and DFT , 2003 .

[17]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Kotagiri Ramamohanarao,et al.  Tensor Space Learning for Analyzing Activity Patterns from Video Sequences , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[19]  Ze-Nian Li,et al.  Successive Convex Matching for Action Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Jesse Hoey,et al.  Hierarchical unsupervised learning of facial expression categories , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[21]  Lifeng Sun,et al.  A Joint Matrix Factorization Approach to Unsupervised Action Categorization , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[22]  Stefano Soatto,et al.  Proximity Distribution Kernels for Geometric Context in Category Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[25]  Jenq-Neng Hwang,et al.  Automatic creation of a talking head from a video sequence , 2005, IEEE Transactions on Multimedia.

[26]  Xuelong Li,et al.  Supervised Tensor Learning , 2005, ICDM.

[27]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[28]  Xuelong Li,et al.  Bayesian Tensor Approach for 3-D Face Modeling , 2008, IEEE Transactions on Circuits and Systems for Video Technology.