Semi-Supervised Time Point Clustering for Multivariate Time Series

The formation and analysis of clusters in multivariate time series can reveal interesting patterns and complex correlations in temporal data. However, traditional clustering methods based on distance metrics fall short to discover interpretable characteristics and structures reflected by these clusters. This paper provides a new method for semi- supervised time point clustering based on the temporal proximity of time points and the correlation of their corresponding values. For this purpose, we utilize CoExDBSCAN, a recently developed density-based clustering algorithm with constrained expansion. CoExDBSCAN allows to identify clusters of temporal neighbourhoods that are only expanded with regards to a priori constraints in defined subspaces. Adopting this algorithm to time series data and grouping segments with similar correlations allows us to find accurate and interpretable structures. We provided a comparison to state-of-the-art methods and verification of our approach on a synthetic dataset and an experimental evaluation on a real-world dataset. The experimental assessment shows that our clustering results can further serve as an effective basis for time series classification.

[1]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[2]  Tamara Radivilova,et al.  Comparative Analysis of Noisy Time Series Clusterin , 2019, COLINS.

[3]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[4]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[5]  Saeed Aghabozorgi,et al.  A Review of Subsequence Time Series Clustering , 2014, TheScientificWorldJournal.

[6]  Dino Ienco,et al.  Deep Multivariate Time Series Embedding Clustering via Attentive-Gated Autoencoder , 2020, PAKDD.

[7]  Joydeep Ghosh,et al.  Data Clustering Algorithms And Applications , 2013 .

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  W. L. Ruzzo,et al.  An empirical study on Principal Component Analysis for clustering gene expression data , 2000 .

[10]  P. Deb Finite Mixture Models , 2008 .

[11]  Mohamed Nadif,et al.  CoClust: A Python Package for Co-Clustering , 2019, Journal of Statistical Software.

[12]  Marc Rußwurm,et al.  Tslearn, A Machine Learning Toolkit for Time Series Data , 2020, J. Mach. Learn. Res..

[13]  Stephen P. Boyd,et al.  Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data , 2017, KDD.

[14]  Myra Spiliopoulou,et al.  C-DBSCAN: Density-Based Clustering with Constraints , 2009, RSFDGrC.

[15]  Vit Niennattrakul,et al.  Selective Subsequence Time Series clustering , 2012, Knowl. Based Syst..

[16]  Hans-Peter Kriegel,et al.  DBSCAN Revisited, Revisited , 2017, ACM Trans. Database Syst..

[17]  Achim Streit,et al.  CoExDBSCAN: Density-based Clustering with Constrained Expansion , 2020, KDIR.

[18]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[19]  M. Cugmas,et al.  On comparing partitions , 2015 .

[20]  J. L. Hodges,et al.  The significance probability of the smirnov two-sample test , 1958 .

[21]  Chengqi Zhang,et al.  Salient Subsequence Learning for Time Series Clustering , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[24]  Pierre Gançarski,et al.  A global averaging method for dynamic time warping, with applications to clustering , 2011, Pattern Recognit..

[25]  Derya Dinler,et al.  A Survey of Constrained Clustering , 2016 .

[26]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[27]  Arthur Zimek,et al.  Model Selection for Semi-Supervised Clustering , 2014, EDBT.

[28]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[29]  Helton Hideraldo Bíscaro,et al.  Hand movement recognition for Brazilian Sign Language: A study using distance-based neural networks , 2009, 2009 International Joint Conference on Neural Networks.

[30]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[31]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .