Time-series Clustering with Jointly Learning Deep Representations, Clusters and Temporal Boundaries

Clustering and segmentation of temporal data is an important task across several fields, with prominent applications in computer vision and machine learning such as face and gesture segmentation. Several related methods have been proposed in literature, focusing on learning temporal boundaries and clusters, with recent works focusing on learning deep representations for clustering. However, none of the proposed methods is suitable for jointly learning segments, clusters, as well as representations. In this paper, we propose the first methodology that simultaneously discovers suitable deep representations, as well as clusters and temporal boundaries, with the clustering process providing supervisory cues for updating temporal boundaries and training the proposed deep learning architecture. We demonstrate the power of the proposed approach on a human motion segmentation task using the CMU-MMAC database. Our method provides the best results with respect to normalized mutual information compared to other clustering algorithms.

[1]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[2]  Jiayu Zhou,et al.  Learning A Task-Specific Deep Architecture For Clustering , 2015, SDM.

[3]  C. V. Jawahar,et al.  Unsupervised Learning of Deep Feature Representation for Clustering Egocentric Actions , 2017, IJCAI.

[4]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[5]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[6]  Reinhard Klein,et al.  Efficient Unsupervised Temporal Segmentation of Motion Data , 2015, IEEE Transactions on Multimedia.

[7]  George Trigeorgis,et al.  A Deep Matrix Factorization Method for Learning Attribute Representations , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jessica K. Hodgins,et al.  Detailed Human Data Acquisition of Kitchen Activities: the CMU-Multimodal Activity Database (CMU-MMAC) , 2008 .

[9]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yong Rui,et al.  Segmenting visual actions based on spatio-temporal motion patterns , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[12]  Gérard G. Medioni,et al.  Structured Time Series Analysis for Human Action Segmentation and Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Nicola J. Ferrier,et al.  Repetitive motion analysis: segmentation and event classification , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Deli Zhao,et al.  Graph Degree Linkage: Agglomerative Clustering on a Directed Graph , 2012, ECCV.

[15]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[16]  Zaïd Harchaoui,et al.  Kernel Change-point Analysis , 2008, NIPS.

[17]  Zhiquan Feng,et al.  A genetic algorithm approach to human motion capture data segmentation , 2014, Comput. Animat. Virtual Worlds.

[18]  Shigeki Sagayama,et al.  Dynamic Time-Alignment Kernel in Support Vector Machine , 2001, NIPS.

[19]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[21]  George Trigeorgis,et al.  A Deep Semi-NMF Model for Learning Hidden Representations , 2014, ICML.

[22]  Jessica K. Hodgins,et al.  Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Lei Feng,et al.  Human Motion Segmentation via Robust Kernel Sparse Subspace Clustering , 2018, IEEE Transactions on Image Processing.

[24]  Rama Chellappa,et al.  A Proximity-Aware Hierarchical Clustering of Faces , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[25]  Jessica K. Hodgins,et al.  Aligned Cluster Analysis for temporal segmentation of human motion , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[26]  James M. Rehg,et al.  Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems , 2008, International Journal of Computer Vision.

[27]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[28]  Lihi Zelnik-Manor,et al.  Statistical analysis of dynamic actions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.