On-line Elastic Similarity Measures for time series

Abstract The way similarity is measured among time series is of paramount importance in many data mining and machine learning tasks. For instance, Elastic Similarity Measures are widely used to determine whether two time series are similar to each other. Indeed, in off-line time series mining, these measures have been shown to be very effective due to their ability to handle time distortions and mitigate their effect on the resulting distance. In the on-line setting, where available data increase continuously over time and not necessary in a stationary manner, stream mining approaches are required to be fast with limited memory consumption and capable of adapting to different stationary intervals. In this sense, the computational complexity of Elastic Similarity Measures and their lack of flexibility to accommodate different stationary intervals, make these similarity measures incompatible with the requirements mentioned. To overcome these issues, this paper adapts the family of Elastic Similarity Measures – which includes Dynamic Time Warping, Edit Distance, Edit Distance for Real Sequences and Edit Distance with Real Penalty – to the on-line setting. The proposed adaptation is based on two main ideas: a forgetting mechanism and the incremental computation. The former makes the similarity consistent with streaming time series characteristics by giving more importance to recent observations, whereas the latter reduces the computational complexity by avoiding unnecessary computations. In order to assess the behavior of the proposed similarity measure in on-line settings, two different experiments have been carried out. The first aims at showing the efficiency of the proposed adaptation, to do so we calculate and compare the computation time for the elastic measures and their on-line adaptation. By analyzing the results drawn from a distance-based streaming machine learning model, the second experiment intends to show the effect of the forgetting mechanism on the resulting similarity value. The experimentation shows, for the aforementioned Elastic Similarity Measures, that the proposed adaptation meets the memory, computational complexity and flexibility constraints imposed by streaming data.

[1]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[2]  SalvadorStan,et al.  Toward accurate dynamic time warping in linear time and space , 2007 .

[3]  Jason Lines,et al.  Time series classification with ensembles of elastic distance measures , 2015, Data Mining and Knowledge Discovery.

[4]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[5]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[6]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[7]  Nuno Constantino Castro,et al.  Time Series Data Mining , 2009, Encyclopedia of Database Systems.

[8]  Olufemi A. Omitaomu,et al.  Weighted dynamic time warping for time series classification , 2011, Pattern Recognit..

[9]  Ming-Syan Chen,et al.  Clustering over Multiple Evolving Streams by Events and Correlations , 2007, IEEE Transactions on Knowledge and Data Engineering.

[10]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[11]  Jessica K. Hodgins,et al.  Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  João Gama,et al.  Hierarchical Clustering of Time-Series Data Streams , 2008, IEEE Transactions on Knowledge and Data Engineering.

[13]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[14]  Sung-Nien Yu,et al.  Electrocardiogram beat classification based on wavelet transformation and probabilistic neural network , 2007, Pattern Recognit. Lett..

[15]  R Darin Ellis,et al.  Distance‐based time series classification approach for task recognition with application in surgical robot autonomy , 2017, The international journal of medical robotics + computer assisted surgery : MRCAS.

[16]  Christos Faloutsos,et al.  FTW: fast similarity search under the time warping distance , 2005, PODS.

[17]  Javier Del Ser,et al.  On-Line Dynamic Time Warping for Streaming Time Series , 2017, ECML/PKDD.

[18]  Antonello Rizzi,et al.  A Novel Algorithm for Online Inexact String Matching and its FPGA Implementation , 2017, Cognitive Computation.

[19]  Antonis A. Argyros,et al.  A graph-based approach for detecting common actions in motion capture data and videos , 2018, Pattern Recognit..

[20]  Eamonn J. Keogh,et al.  Everything you know about Dynamic Time Warping is Wrong , 2004 .

[21]  Pierre Gançarski,et al.  A global averaging method for dynamic time warping, with applications to clustering , 2011, Pattern Recognit..

[22]  Laurent Itti,et al.  shapeDTW: Shape Dynamic Time Warping , 2016, Pattern Recognit..

[23]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[24]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[25]  Joydeep Ghosh,et al.  Data Clustering Algorithms And Applications , 2013 .

[26]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[27]  M Congedo,et al.  A review of classification algorithms for EEG-based brain–computer interfaces , 2007, Journal of neural engineering.

[28]  Jean-Yves Ramel,et al.  Comparative study of conventional time series matching techniques for word spotting , 2018, Pattern Recognit..

[29]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[30]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[31]  Qinghua Hu,et al.  Dynamic time warping constraint learning for large margin nearest neighbor classification , 2011, Inf. Sci..

[32]  Javier Del Ser,et al.  Nature-inspired approaches for distance metric learning in multivariate time series classification , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[33]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[34]  Abdullah Mueen,et al.  Enumeration of time series motifs of all lengths , 2013, 2013 IEEE 13th International Conference on Data Mining.

[35]  Javier Del Ser,et al.  Detection of non-technical losses in smart meter data based on load curve profiling and time series analysis , 2017 .

[36]  Enrique Vidal,et al.  Computation of Normalized Edit Distance and Applications , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Sanjoy Dasgupta,et al.  Early Classification of Time Series by Simultaneously Optimizing the Accuracy and Earliness , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Fernando De la Torre,et al.  Generalized Canonical Time Warping , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[40]  Miguel Angel Ferrer-Ballester,et al.  SM-DTW: Stability Modulated Dynamic Time Warping for signature verification , 2019, Pattern Recognit. Lett..

[41]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[42]  P. B. Coaker,et al.  Applied Dynamic Programming , 1964 .

[43]  Eyke Hüllermeier,et al.  Online clustering of parallel data streams , 2006, Data Knowl. Eng..

[44]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[45]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .