Comparing Time-Series Clustering Algorithms in R Using the dtwclust Package

Most clustering strategies have not changed considerably since their initial definition. The common improvements are either related to the distance measure used to assess dissimilarity, or the function used to calculate prototypes. Time-series clustering is no exception, with the Dynamic Time Warping distance being particularly popular in that context. This distance is computationally expensive, so many related optimizations have been developed over the years. Since no single clustering algorithm can be said to perform best on all datasets, different strategies must be tested and compared, so a common infrastructure can be advantageous. In this manuscript, a general overview of shapebased time-series clustering is given, including many specifics related to Dynamic Time Warping and other recently proposed techniques. At the same time, a description of the dtwclust package for the R statistical software is provided, showcasing how it can be used to evaluate many different time-series clustering procedures.

[1]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[2]  Pierre Gançarski,et al.  A global averaging method for dynamic time warping, with applications to clustering , 2011, Pattern Recognit..

[3]  Anupam Joshi,et al.  Low-complexity fuzzy relational clustering algorithms for Web mining , 2001, IEEE Trans. Fuzzy Syst..

[4]  Luis Gravano,et al.  k-Shape: Efficient and Accurate Clustering of Time Series , 2015, SIGMOD Conference.

[5]  Pablo Montero,et al.  TSclust: An R Package for Time Series Clustering , 2014 .

[6]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[7]  Christian Buchta,et al.  Distance and Similarity Measures , 2015, Encyclopedia of Multimedia.

[8]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[9]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[10]  Yiyu Shi,et al.  Accelerating Dynamic Time Warping With Memristor-Based Customized Fabrics , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Eamonn J. Keogh,et al.  Semi-Supervision Dramatically Improves Time Series Clustering under Dynamic Time Warping , 2016, CIKM.

[12]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[13]  Weina Wang,et al.  On fuzzy cluster validity indices , 2007, Fuzzy Sets Syst..

[14]  Geeta Sikka,et al.  Recent Techniques of Clustering of Time Series Data: A Survey , 2012 .

[15]  Marco Cuturi,et al.  Soft-DTW: a Differentiable Loss Function for Time-Series , 2017, ICML.

[16]  Alexander Mendiburu,et al.  Distance Measures for Time Series in R: The TSdist Package , 2016, R J..

[17]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[18]  Marco Cuturi,et al.  Fast Global Alignment Kernels , 2011, ICML.

[19]  Eamonn J. Keogh,et al.  Everything you know about Dynamic Time Warping is Wrong , 2004 .

[20]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[21]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[22]  Witold Pedrycz,et al.  Fuzzy clustering of time series data using dynamic time warping distance , 2015, Eng. Appl. Artif. Intell..

[23]  Steve Weston,et al.  Provides Foreach Looping Construct for R , 2015 .