An Invariance-guided Stability Criterion for Time Series Clustering Validation

Time series clustering is a challenging task due to the specificities of this type of data. Temporal correlation and invariance to transformations such as shifting, warping or noise prevent the use of standard data mining methods. Time series clustering has been mostly studied under the angle of finding efficient algorithms and distance metrics adapted to the specific nature of time series data. Much less attention has been devoted to the general problem of model selection. Clustering stability has emerged as a universal and model-agnostic principle for clustering model selection. This principle can be stated as follows: an algorithm should find a structure in the data that is resilient to perturbation by sampling or noise. We propose to apply stability analysis to time series by leveraging prior knowledge on the nature and invariances of the data. These invariances determine the perturbation process used to assess stability. Based on a recently introduced criterion combining between-cluster and within-cluster stability, we propose an invariance-guided method for model selection, applicable to a wide range of clustering algorithms. Experiments conducted on artificial and benchmark data sets demonstrate the ability of our criterion to discover structure and select the correct number of clusters, whenever data invariances are known beforehand.

[1]  D. Piccolo A DISTANCE MEASURE FOR CLASSIFYING ARIMA MODELS , 1990 .

[2]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[3]  Ulrike von Luxburg,et al.  Clustering Stability: An Overview , 2010, Found. Trends Mach. Learn..

[4]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[5]  Pierre Gançarski,et al.  A global averaging method for dynamic time warping, with applications to clustering , 2011, Pattern Recognit..

[6]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Qianli Ma,et al.  Learning Representations for Time Series Clustering , 2019, NeurIPS.

[8]  Frank Nielsen,et al.  A Proposal of a Methodological Framework with Experimental Guidelines to Investigate Clustering Stability on Financial Time Series , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[9]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[10]  Michael Flynn,et al.  The UEA multivariate time series classification archive, 2018 , 2018, ArXiv.

[11]  Joachim M. Buhmann,et al.  A Resampling Approach to Cluster Validation , 2002, COMPSTAT.

[12]  Martha Tatusch,et al.  Fuzzy Clustering Stability Evaluation of Time Series , 2020, IPMU.

[13]  Gunnar Rätsch,et al.  SOM-VAE: Interpretable Discrete Representation Learning on Time Series , 2018, ICLR 2018.

[14]  J. Neel,et al.  Cluster analysis methods for speech recognition , 2005 .

[15]  Qing Pan,et al.  Data Augmentation for Deep Learning-Based ECG Analysis , 2020 .

[16]  Michel Verleysen,et al.  The Curse of Dimensionality in Data Mining and Time Series Prediction , 2005, IWANN.

[17]  Patrick Schäfer The BOSS is concerned with time series classification in the presence of noise , 2014, Data Mining and Knowledge Discovery.

[18]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[19]  Luis Gravano,et al.  k-Shape: Efficient and Accurate Clustering of Time Series , 2016, SGMD.

[20]  A. Zimek,et al.  On Using Class-Labels in Evaluation of Clusterings , 2010 .

[21]  Tom Monnier,et al.  Deep Transformation-Invariant Clustering , 2020, NeurIPS.

[22]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[23]  Germain Forestier,et al.  Adversarial Attacks on Deep Neural Networks for Time Series Classification , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[24]  Faicel Chamroukhi,et al.  Model‐based clustering and classification of functional data , 2018, WIREs Data Mining Knowl. Discov..

[25]  Homa Karimabadi,et al.  Deep Temporal Clustering : Fully Unsupervised Learning of Time-Domain Features , 2018, ArXiv.

[26]  M. Cugmas,et al.  On comparing partitions , 2015 .

[27]  Chengqi Zhang,et al.  Salient Subsequence Learning for Time Series Clustering , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  M. Lebbah,et al.  Selecting the Number of Clusters K with a Stability Trade-off: an Internal Validation Criterion , 2020, ArXiv.

[29]  Jianqing Li,et al.  Feature Engineering and Computational Intelligence in ECG Monitoring , 2020 .

[30]  Shai Ben-David,et al.  Clustering - What Both Theoreticians and Practitioners Are Doing Wrong , 2018, AAAI.

[31]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[32]  Shai Ben-David,et al.  A Sober Look at Clustering Stability , 2006, COLT.

[33]  Pascal Frossard,et al.  Adaptive data augmentation for image classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[34]  Brendan J. Frey,et al.  Transformation-Invariant Clustering and Dimensionality Reduction Using EM , 2001 .

[35]  Marina Meila How to tell when a clustering is (approximately) correct using convex relaxations , 2018, NeurIPS.

[36]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[37]  Ulf Leser,et al.  Multivariate Time Series Classification with WEASEL+MUSE , 2017, ArXiv.

[38]  Mustapha Lebbah,et al.  Clustering de séries temporelles par construction de dictionnaire , 2020, EGC.

[39]  C. Bouveyron,et al.  The discriminative functional mixture model for a comparative analysis of bike sharing systems , 2016, 1601.07999.

[40]  Roberto Baragona,et al.  A simulation study on clustering time series with metaheuristic methods , 2001 .

[41]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[42]  Hanene Azzag,et al.  Autonomous Driving Validation with Model-Based Dictionary Clustering , 2020, ECML/PKDD.

[43]  Eamonn J. Keogh,et al.  Mining motifs in massive time series databases , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[44]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[45]  Eamonn J. Keogh,et al.  Clustering Time Series Using Unsupervised-Shapelets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[46]  Arjan Kuijper,et al.  Data augmentation for time series: traditional vs generative models on capacitive proximity time series , 2020, PETRA.

[47]  Gustavo E. A. P. A. Batista,et al.  An Empirical Comparison of Dissimilarity Measures for Time Series Classification , 2013, 2013 Brazilian Conference on Intelligent Systems.