Dual clustering for categorization of action sequences

This paper proposes a novel algorithm for categorization of action video sequences using unsupervised dual clustering. Given a video database, we extract motion information of actions and perform nonlinear dimensionality reduction for addressing both the high dimensionality of silhouette features and non-linearity of articulated human actions. A k-means clustering is first performed on frame-wise features in the embedding space to convert each video in the database to a sequence of labels, each of which corresponds to one of k ¿key¿ feature frames. The dissimilarity between any two label sequences is then measured using edit distance. The resulting pairwise dissimilarity matrix is finally input to a spectral clustering algorithm to obtain the category labels of each action video. Experimental results on two recent data sets demonstrate the effectiveness and efficiency of the proposed algorithm.

[1]  M. Irani,et al.  Event-Based Video Analysis, , 2001 .

[2]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[4]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Yang Wang,et al.  Unsupervised Discovery of Action Classes , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Stefan Carlsson,et al.  Pose-based clustering in action sequences , 2003, First IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003. HLK 2003..

[8]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[9]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.