Maximum Margin Temporal Clustering

Temporal Clustering (TC) refers to the factorization of multiple time series into a set of non-overlapping segments that belong to k temporal clusters. Existing methods based on extensions of generative models such as k-means or Switching Linear Dynamical Systems (SLDS) often lead to intractable inference and lack a mechanism for feature selection, critical when dealing with high dimensional data. To overcome these limitations, this paper proposes Maximum Margin Temporal Clustering (MMTC). MMTC simultaneously determines the start and the end of each segment, while learning a multiclass Support Vector Machine (SVM) to discriminate among temporal clusters. MMTC extends Maximum Margin Clustering in two ways: first, it incorporates the notion of TC, and second, it introduces additional constraints to achieve better balance between clusters. Experiments on clustering human actions and bee dancing motions illustrate the benefits of our approach compared to state-of-the-art methods.

[1]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[2]  Li Wang,et al.  Discriminative human action segmentation and recognition using semi-Markov model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[5]  Dale Schuurmans,et al.  Unsupervised and Semi-Supervised Multi-Class Support Vector Machines , 2005, AAAI.

[6]  Zaïd Harchaoui,et al.  DIFFRAC: a discriminative and flexible framework for clustering , 2007, NIPS.

[7]  Takeo Kanade,et al.  Discriminative cluster analysis , 2006, ICML.

[8]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[9]  Nenghai Yu,et al.  Maximum Margin Clustering with Pairwise Constraints , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Arnold W. M. Smeulders,et al.  Visual quasi-periodicity , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[12]  Peter Sunehag,et al.  Semi-Markov kMeans Clustering and Activity Recognition from Body-Worn Sensors , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[13]  Thierry Artières,et al.  Large margin training for hidden Markov models with partially observed states , 2009, ICML '09.

[14]  Michael I. Jordan,et al.  Nonparametric Bayesian Learning of Switching Linear Dynamical Systems , 2008, NIPS.

[15]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[16]  Fei Wang,et al.  Unsupervised Maximum Margin Feature Selection with manifold regularization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Jessica K. Hodgins,et al.  Aligned Cluster Analysis for temporal segmentation of human motion , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[18]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Fei Wang,et al.  Maximum Margin Clustering on Data Manifolds , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[20]  Christoph H. Lampert,et al.  Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[21]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[22]  James M. Rehg,et al.  Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems , 2008, International Journal of Computer Vision.

[23]  Pietro Perona,et al.  Decomposition of human motion into dynamics-based primitives with application to drawing tasks , 2003, Autom..

[24]  Carsten Rother,et al.  Weakly supervised discriminative localization and classification: a joint learning process , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Kevin P. Murphy,et al.  Modeling changing dependency structure in multivariate time series , 2007, ICML '07.

[26]  Odette Scharenborg,et al.  Finding Maximum Margin Segments in Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[27]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[28]  Zaïd Harchaoui,et al.  Kernel Change-point Analysis , 2008, NIPS.

[29]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[30]  Yiannis Aloimonos,et al.  Understanding visuo‐motor primitives for motion synthesis and analysis , 2006, Comput. Animat. Virtual Worlds.

[31]  Fei Wang,et al.  Efficient multiclass maximum margin clustering , 2008, ICML '08.

[32]  Oliver Kramer,et al.  Fast evolutionary maximum margin clustering , 2009, ICML '09.

[33]  Lihi Zelnik-Manor,et al.  Statistical analysis of dynamic actions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[35]  Rama Chellappa,et al.  Unsupervised view and rate invariant clustering of video sequences q , 2009 .

[36]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[37]  Fernando De la Torre,et al.  Unsupervised discovery of facial events , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.