A shape-based similarity measure for time series data with ensemble learning

This paper introduces a shape-based similarity measure, called the angular metric for shape similarity (AMSS), for time series data. Unlike most similarity or dissimilarity measures, AMSS is based not on individual data points of a time series but on vectors equivalently representing it. AMSS treats a time series as a vector sequence to focus on the shape of the data and compares data shapes by employing a variant of cosine similarity. AMSS is, by design, expected to be robust to time and amplitude shifting and scaling, but sensitive to short-term oscillations. To deal with the potential drawback, ensemble learning is adopted, which integrates data smoothing when AMSS is used for classification. Evaluative experiments reveal distinct properties of AMSS and its effectiveness when applied in the ensemble framework as compared to existing measures.

[1]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[2]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[3]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[6]  Theodosios Pavlidis,et al.  Segmentation of Plane Curves , 1974, IEEE Transactions on Computers.

[7]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[8]  Donald J. Berndt,et al.  Finding Patterns in Time Series: A Dynamic Programming Approach , 1996, Advances in Knowledge Discovery and Data Mining.

[9]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[10]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[11]  Hans-Peter Kriegel,et al.  Similarity Search on Time Series Based on Threshold Queries , 2006, EDBT.

[12]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[13]  Dimitrios Gunopulos Time Series Similarity Measures , 2005 .

[14]  SekiKazuhiro,et al.  A shape-based similarity measure for time series data with ensemble learning , 2013 .

[15]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[16]  Ludmila I. Kuncheva,et al.  Time series classification: Decision forests and SVM on interval and DTW features , 2007, KDD 2007.

[17]  Dimitrios Gunopulos,et al.  Finding Similar Time Series , 1997, PKDD.

[18]  Dimitrios Gunopulos,et al.  Time series similarity measures (tutorial PM-2) , 2000, KDD '00.

[19]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[20]  Yannis Theodoridis,et al.  Index-based Most Similar Trajectory Search , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Pierre Geurts,et al.  Segment and Combine Approach for Non-parametric Time-Series Classification , 2005, PKDD.

[22]  Jignesh M. Patel,et al.  An efficient and accurate method for evaluating time series similarity , 2007, SIGMOD '07.

[23]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[24]  Anthony K. H. Tung,et al.  SpADe: On Shape-based Pattern Detection in Streaming Time Series , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[26]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[27]  Kuniaki Uehara,et al.  Multistrategical Approach in Visual Learning , 2007, ACCV.

[28]  Eamonn J. Keogh,et al.  Derivative Dynamic Time Warping , 2001, SDM.