Distance Measures for Time Series in R: The TSdist Package

The definition of a distance measure between time series is crucial for many time series data mining tasks, such as clustering and classification. For this reason, a vast portfolio of time series distance measures has been published in the past few years. In this paper, the TSdist package is presented, a complete tool which provides a unified framework to calculate the largest variety of time series dissimilarity measures available in R at the moment, to the best of our knowledge. The package implements some popular distance measures which were not previously available in R, and moreover, it also provides wrappers for measures already included in other R packages. Additionally, the application of these distance measures to clustering and classification tasks is also supported in TSdist, directly enabling the evaluation and comparison of their performance within these two frameworks.

[1]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[2]  Pierre Geurts,et al.  Contributions to decision tree induction: bias/variance tradeoff and time series classification , 2002 .

[3]  Silke Wagner,et al.  Comparing Clusterings - An Overview , 2007 .

[4]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[5]  Pablo Montero,et al.  TSclust: An R Package for Time Series Clustering , 2014 .

[6]  P. Boesiger,et al.  A new correlation‐based fuzzy logic clustering algorithm for FMRI , 1998, Magnetic resonance in medicine.

[7]  Hans-Peter Kriegel,et al.  Similarity Search on Time Series Based on Threshold Queries , 2006, EDBT.

[8]  Carlos Agón,et al.  Time-series data mining , 2012, CSUR.

[9]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[10]  Alexander Mendiburu,et al.  Similarity Measure Selection for Clustering Time Series Databases , 2016, IEEE Transactions on Knowledge and Data Engineering.

[11]  A. Zeileis,et al.  zoo: S3 Infrastructure for Regular and Irregular Time Series , 2005, math/0505527.

[12]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[13]  Andreas M. Brandmaier,et al.  pdc: An R package for complexity-based clustering of time series [Computer software] , 2015 .

[14]  Sylvie Gibet,et al.  On Recursive Edit Distance Kernels With Application to Time Series Classification , 2010, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[16]  Yannis Theodoridis,et al.  Index-based Most Similar Trajectory Search , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[18]  Toni Giorgino,et al.  Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package , 2009 .

[19]  James Large,et al.  The Great Time Series Classification Bake Off: An Experimental Evaluation of Recently Proposed Algorithms. Extended Version , 2016, ArXiv.

[20]  Marco Cuturi,et al.  Fast Global Alignment Kernels , 2011, ICML.

[21]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[22]  Paul Lukowicz,et al.  On general purpose time series similarity measures and their use as kernel functions in support vector machines , 2014, Inf. Sci..

[23]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[24]  Christian Buchta,et al.  Distance and Similarity Measures , 2015, Encyclopedia of Multimedia.

[25]  J. Vogelstein Department of Applied Mathematics and Statistics , 2022 .

[26]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[27]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[28]  Frank Klawonn,et al.  Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points , 2003, IDA.

[29]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[30]  Cordelia Schmid,et al.  A time series kernel for action recognition , 2011, BMVC.