Time series representation and similarity based on local autopatterns

Time series data mining has received much greater interest along with the increase in temporal data sets from different domains such as medicine, finance, multimedia, etc. Representations are important to reduce dimensionality and generate useful similarity measures. High-level representations such as Fourier transforms, wavelets, piecewise polynomial models, etc., were considered previously. Recently, autoregressive kernels were introduced to reflect the similarity of the time series. We introduce a novel approach to model the dependency structure in time series that generalizes the concept of autoregression to local autopatterns. Our approach generates a pattern-based representation along with a similarity measure called learned pattern similarity (LPS). A tree-based ensemble-learning strategy that is fast and insensitive to parameter settings is the basis for the approach. Then, a robust similarity measure based on the learned patterns is presented. This unsupervised approach to represent and measure the similarity between time series generally applies to a number of data mining tasks (e.g., clustering, anomaly detection, classification). Furthermore, an embedded learning of the representation avoids pre-defined features and an extraction step which is common in some feature-based approaches. The method generalizes in a straightforward manner to multivariate time series. The effectiveness of LPS is evaluated on time series classification problems from various domains. We compare LPS to eleven well-known similarity measures. Our experimental results show that LPS provides fast and competitive results on benchmark datasets from several domains. Furthermore, LPS provides a research direction and template approach that breaks from the linear dependency models to potentially foster other promising nonlinear approaches.

[1]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[2]  Setsuo Ohsuga,et al.  INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES , 1977 .

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  J. M. Cortina,et al.  Interaction, Nonlinearity, and Multicollinearity: implications for Multiple Regression: , 1993 .

[5]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[6]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[7]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[8]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[9]  Pierre Geurts,et al.  Pattern Extraction for Time Series Classification , 2001, PKDD.

[10]  Eamonn J. Keogh,et al.  Derivative Dynamic Time Warping , 2001, SDM.

[11]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[12]  Andrew Tomkins,et al.  Mining and knowledge discovery from the Web , 2004, 7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings..

[13]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[14]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[15]  Qiang Wang,et al.  Partial elastic matching of time series , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[16]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[17]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[18]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[19]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[20]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[21]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[22]  Eamonn J. Keogh,et al.  LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures , 2006, VLDB.

[23]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[24]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[25]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[26]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[27]  Pierre-François Marteau,et al.  Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Zhen Wang,et al.  uWave: Accelerometer-based Personalized Gesture Recognition and Its Applications , 2009, PerCom.

[29]  Qiang Wang,et al.  Time series analysis with multiple resolutions , 2010, Inf. Syst..

[30]  Vladimir Pavlovic,et al.  Spatial Representation for Efficient Sequence Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[31]  Marcella Corduas,et al.  Mining Time Series Data: A Selective Survey , 2010 .

[32]  Shahrokh Valaee,et al.  Accelerometer-based gesture recognition via dynamic-time warping, affinity propagation, & compressive sensing , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[34]  Olufemi A. Omitaomu,et al.  Weighted dynamic time warping for time series classification , 2011, Pattern Recognit..

[35]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[36]  Marco Cuturi,et al.  Fast Global Alignment Kernels , 2011, ICML.

[37]  Cordelia Schmid,et al.  A time series kernel for action recognition , 2011, BMVC.

[38]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[39]  Yuan Li,et al.  Rotation-invariant similarity in time series using bag-of-patterns representation , 2012, Journal of Intelligent Information Systems.

[40]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[41]  Jason Lines,et al.  Transformation Based Ensembles for Time Series Classification , 2012, SDM.

[42]  L. Schmidt-Thieme,et al.  Invariant Factorization Of Time-Series , 2013, ArXiv.

[43]  Gautam Das,et al.  The Move-Split-Merge Metric for Time Series , 2013, IEEE Transactions on Knowledge and Data Engineering.

[44]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[45]  Eamonn J. Keogh,et al.  CID: an efficient complexity-invariant distance for time series , 2013, Data Mining and Knowledge Discovery.

[46]  George C. Runger,et al.  A Bag-of-Features Framework to Classify Time Series , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Jason Lines,et al.  Classification of time series by shapelet transformation , 2013, Data Mining and Knowledge Discovery.

[48]  X. Yao,et al.  Model-based kernel for efficient time series analysis , 2013, KDD.

[49]  Jason Lines,et al.  Time series classification with ensembles of elastic distance measures , 2015, Data Mining and Knowledge Discovery.

[50]  Lars Schmidt-Thieme,et al.  Invariant time-series factorization , 2014, Data Mining and Knowledge Discovery.

[51]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[52]  George C. Runger,et al.  Learning a symbolic representation for multivariate time series classification , 2015, Data Mining and Knowledge Discovery.

[53]  Bülent Sankur,et al.  Probabilistic sequence clustering with spectral learning , 2014, Digit. Signal Process..