Human action segmentation and classification based on the Isomap algorithm

Visual analysis of human behavior has attracted a great deal of attention in the field of computer vision because of the wide variety of potential applications. Human behavior can be segmented into atomic actions, each of which indicates a single, basic movement. To reduce human intervention in the analysis of human behavior, unsupervised learning may be more suitable than supervised learning. However, the complex nature of human behavior analysis makes unsupervised learning a challenging task. In this paper, we propose a framework for the unsupervised analysis of human behavior based on manifold learning. First, a pairwise human posture distance matrix is derived from a training action sequence. Then, the isometric feature mapping (Isomap) algorithm is applied to construct a low-dimensional structure from the distance matrix. Consequently, the training action sequence is mapped into a manifold trajectory in the Isomap space. To identify the break points between the trajectories of any two successive atomic actions, we represent the manifold trajectory in the Isomap space as a time series of low-dimensional points. A temporal segmentation technique is then applied to segment the time series into sub series, each of which corresponds to an atomic action. Next, the dynamic time warping (DTW) approach is used to cluster atomic action sequences. Finally, we use the clustering results to learn and classify atomic actions according to the nearest neighbor rule. If the distance between the input sequence and the nearest mean sequence is greater than a given threshold, it is regarded as an unknown atomic action. Experiments conducted on real data demonstrate the effectiveness of the proposed method.

[1]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[2]  Kuo-Chin Fan,et al.  Motion Flow-Based Video Retrieval , 2007, IEEE Transactions on Multimedia.

[3]  Sheng-Wen Shih,et al.  Learning Atomic Human Actions Using Variable-Length Markov Models , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[5]  Jun-Wei Hsieh,et al.  Video-Based Human Movement Analysis and Its Application to Surveillance Systems , 2008, IEEE Transactions on Multimedia.

[6]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[8]  Tieniu Tan,et al.  Recent developments in human motion analysis , 2003, Pattern Recognit..

[9]  Stanley T. Birchfield,et al.  Isomap Tracking with Particle Filtering , 2007, 2007 IEEE International Conference on Image Processing.

[10]  Takeo Kanade,et al.  Introduction to the Special Section on Video Surveillance , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Vladimir Pavlovic,et al.  Toward multimodal human-computer interface , 1998, Proc. IEEE.

[12]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[13]  Nanning Zheng,et al.  Unsupervised Analysis of Human Gestures , 2001, IEEE Pacific Rim Conference on Multimedia.

[14]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.

[15]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[16]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[17]  Eraldo Ribeiro,et al.  Human Motion Recognition Using Isomap and Dynamic Time Warping , 2007, Workshop on Human Motion.

[18]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[19]  B. Moor,et al.  Subspace angles and distances between ARMA models , 2000 .

[20]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[21]  Hisashi Miyamori,et al.  Video annotation for content-based retrieval using human behavior analysis and domain knowledge , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[22]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[24]  I.H. Witten,et al.  On-line and off-line heuristics for inferring hierarchies of repetitions in sequences , 2000, Proceedings of the IEEE.

[25]  Ahmed M. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  Anil K. Jain,et al.  Incremental nonlinear dimensionality reduction by manifold learning , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[28]  Rama Chellappa,et al.  From Videos to Verbs: Mining Videos for Activities using a Cascade of Dynamical Systems , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[30]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[31]  Hongbin Zha,et al.  Riemannian Manifold Learning , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Octavia I. Camps,et al.  Modeling Correspondences for Multi-Camera Tracking Using Nonlinear Manifold Learning and Target Dynamics , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[34]  Liang Wang,et al.  Visual learning and recognition of sequential data manifolds with applications to human movement analysis , 2008, Comput. Vis. Image Underst..