Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion

Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[3]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[5]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[6]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[8]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[9]  Mauro Dell'Amico,et al.  Assignment Problems , 1998, IFIP Congress: Fundamentals - Foundations of Computer Science.

[10]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[11]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[12]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[13]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[14]  Xavier Binefa,et al.  Robust Real-Time Periodic Motion Detection, Analysis, and Applications , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  R. Bowden Learning Statistical Models of Human Motion , 2000 .

[16]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[17]  Yong Rui,et al.  Segmenting visual actions based on spatio-temporal motion patterns , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[18]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[19]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[20]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[21]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[22]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[23]  Shigeki Sagayama,et al.  Dynamic Time-Alignment Kernel in Support Vector Machine , 2001, NIPS.

[24]  Maja J. Mataric,et al.  Automated Derivation of Primitives for Movement Classification , 2000, Auton. Robots.

[25]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[26]  Maja J. Mataric,et al.  Deriving action and behavior primitives from human motion data , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Pietro Perona,et al.  Decomposition of human motion into dynamics-based primitives with application to drawing tasks , 2003, Autom..

[28]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[29]  Bobby Bodenheimer,et al.  An evaluation of a cost metric for selecting transitions between motion segments , 2003, SCA '03.

[30]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[31]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[32]  Jernej Barbic,et al.  Segmenting Motion Capture Data into Distinct Behaviors , 2004, Graphics Interface.

[33]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[34]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[35]  Nicola J. Ferrier,et al.  Repetitive motion analysis: segmentation and event classification , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Maja J. Mataric,et al.  A spatio-temporal extension to Isomap nonlinear dimension reduction , 2004, ICML.

[37]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[38]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[39]  Amnon Shashua,et al.  A unifying approach to hard and probabilistic clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[40]  Bernard Haasdonk,et al.  Feature space interpretation of SVMs with indefinite kernels , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Tido Röder,et al.  Efficient content-based retrieval of motion capture data , 2005, SIGGRAPH 2005.

[42]  Manuel Davy,et al.  An online kernel change detection algorithm , 2005, IEEE Transactions on Signal Processing.

[43]  Eugene Fiume,et al.  An efficient search algorithm for motion data using weighted PCA , 2005, SCA '05.

[44]  Meinard Müller,et al.  Efficient content-based retrieval of motion capture data , 2005, ACM Trans. Graph..

[45]  Yiannis Aloimonos,et al.  Understanding visuo‐motor primitives for motion synthesis and analysis , 2006, Comput. Animat. Virtual Worlds.

[46]  Takeo Kanade,et al.  Discriminative cluster analysis , 2006, ICML.

[47]  Lihi Zelnik-Manor,et al.  Statistical analysis of dynamic actions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Paul Fearnhead,et al.  Exact and efficient Bayesian inference for multiple changepoint problems , 2006, Stat. Comput..

[49]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[50]  Guodong Liu,et al.  Segment-based human motion compression , 2006, SCA '06.

[51]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[52]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[53]  Tomoko Matsui,et al.  A Kernel for Time Series Based on Global Alignments , 2006, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[54]  Irfan A. Essa,et al.  Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning , 2007, AAAI.

[55]  Jürgen Kurths,et al.  Recurrence plots for the analysis of complex systems , 2009 .

[56]  Fernando De la Torre,et al.  Multimodal Diaries , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[57]  James M. Rehg,et al.  Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems , 2008, International Journal of Computer Vision.

[58]  Yiannis Aloimonos,et al.  A Language for Human Action , 2007, Computer.

[59]  Kevin P. Murphy,et al.  Modeling changing dependency structure in multivariate time series , 2007, ICML '07.

[60]  Fernando De la Torre,et al.  Temporal Segmentation of Facial Behavior , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[61]  Irfan A. Essa,et al.  Structure from Statistics - Unsupervised Activity Analysis using Suffix Trees , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[62]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[63]  Jessica K. Hodgins,et al.  Aligned Cluster Analysis for temporal segmentation of human motion , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[64]  Takumi Kobayashi,et al.  Motion image segmentation using global criteria and DP , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[65]  Philippe Beaudoin,et al.  Motion-motif graphs , 2008, SCA '08.

[66]  Zaïd Harchaoui,et al.  Kernel Change-point Analysis , 2008, NIPS.

[67]  Michael I. Jordan,et al.  Nonparametric Bayesian Learning of Switching Linear Dynamical Systems , 2008, NIPS.

[68]  Michael I. Jordan,et al.  Sharing Features among Dynamical Systems with Beta Processes , 2009, NIPS.

[69]  Rama Chellappa,et al.  Unsupervised view and rate invariant clustering of video sequences q , 2009 .

[70]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[71]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[72]  Fernando De la Torre,et al.  A Least-Squares Framework for Component Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.