Hierarchical Aligned Cluster Analysis ( HACA ) for Temporal Segmentation of Human Motion

Temporal segmentation of human motion into plausible motion primitives is central to the understanding and building computational models of human motion. Several issues contribute to the challenge of temporal segmentation of human motion. These include the large variability in the temporal scale and periodicity of human actions, the complexity of representing articulated motion, and the exponential nature of all possible movement combinations. We formulate the temporal segmentation problem as an extension of standard kernel k-means clustering, and derive an unsupervised hierarchical bottom-up framework called Hierarchical Aligned Cluster Analysis (HACA). HACA extends standard kernel kmeans clustering in three ways: (1) allows the cluster means contain a variable number of features, (2) introduces a generalized dynamic time warping (DTW) kernel as temporal metric between sequences, and (3) incorporates parameters that enable the user to specify the time granularity of the motion primitive in the hierarchical decomposition. Experimental results on motion capture data and video demonstrate the effectiveness of HACA for decomposing complex human motions.

[1]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[2]  Jürgen Kurths,et al.  Recurrence plots for the analysis of complex systems , 2009 .

[3]  Nicola J. Ferrier,et al.  Repetitive motion analysis: segmentation and event classification , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  G. Rizzolatti,et al.  Neurophysiological mechanisms underlying the understanding and imitation of action , 2001, Nature Reviews Neuroscience.

[5]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[6]  Jessica K. Hodgins,et al.  Aligned Cluster Analysis for temporal segmentation of human motion , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[7]  R. Bowden Learning Statistical Models of Human Motion , 2000 .

[8]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Yong Rui,et al.  Segmenting visual actions based on spatio-temporal motion patterns , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Chiraz Ben Abdelkader Motion-Based Recognition of People in EigenGait Space , 2002 .

[12]  Fernando De la Torre,et al.  Temporal Segmentation of Facial Behavior , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Michael J. Black,et al.  Learning and Tracking Cyclic Human Motion , 2000, NIPS.

[14]  Jernej Barbic,et al.  Segmenting Motion Capture Data into Distinct Behaviors , 2004, Graphics Interface.

[15]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[16]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[17]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[18]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, CVPR 2004.

[19]  Maja J. Mataric,et al.  A spatio-temporal extension to Isomap nonlinear dimension reduction , 2004, ICML.

[20]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21]  Steven M. Seitz,et al.  View-Invariant Analysis of Cyclic Motion , 1997, International Journal of Computer Vision.

[22]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[23]  Lance Williams,et al.  Motion signal processing , 1995, SIGGRAPH.

[24]  Maja J. Mataric,et al.  Deriving action and behavior primitives from human motion data , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Guodong Liu,et al.  Segment-based human motion compression , 2006, SCA '06.

[26]  Steven Lemm,et al.  A Dynamic HMM for On-line Segmentation of Sequential Data , 2001, NIPS.

[27]  P. Perona,et al.  Primitives for Human Motion: a Dynamical Approach , 2002 .

[28]  Yiannis Aloimonos,et al.  Understanding visuo‐motor primitives for motion synthesis and analysis , 2006, Comput. Animat. Virtual Worlds.

[29]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[30]  Irfan A. Essa,et al.  Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning , 2007, AAAI.

[31]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[32]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[33]  Bobby Bodenheimer,et al.  An evaluation of a cost metric for selecting transitions between motion segments , 2003, SCA '03.

[34]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[35]  Yiannis Aloimonos,et al.  A Language for Human Action , 2007, Computer.

[36]  Eugene Fiume,et al.  An efficient search algorithm for motion data using weighted PCA , 2005, SCA '05.

[37]  Aaron Hertzmann,et al.  Style machines , 2000, SIGGRAPH 2000.

[38]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[40]  Kari Pulli,et al.  Style translation for human motion , 2005, SIGGRAPH 2005.

[41]  Rama Chellappa,et al.  From Videos to Verbs: Mining Videos for Activities using a Cascade of Dynamical Systems , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Harry Shum,et al.  Motion texture: a two-level statistical model for character motion synthesis , 2002, ACM Trans. Graph..

[43]  Fernando De la Torre,et al.  Multimodal Diaries , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[44]  Maja J. Mataric,et al.  Automated Derivation of Primitives for Movement Classification , 2000, Auton. Robots.

[45]  Shigeki Sagayama,et al.  Dynamic Time-Alignment Kernel in Support Vector Machine , 2001, NIPS.